POPULARITY
In this podcast episode, we talked with Adrian Brudaru about the past, present and future of data engineering.About the speaker:Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted.As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.0:00 Introduction to DataTalks.Club1:05 Discussing trends in data engineering with Adrian2:03 Adrian's background and journey into data engineering5:04 Growth and updates on Adrian's company, DLT Hub9:05 Challenges and specialization in data engineering today13:00 Opportunities for data engineers entering the field15:00 The "Modern Data Stack" and its evolution17:25 Emerging trends: AI integration and Iceberg technology27:40 DuckDB and the emergence of portable, cost-effective data stacks32:14 The rise and impact of dbt in data engineering34:08 Alternatives to dbt: SQLMesh and others35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions37:20 Audience questions: Career focus in data roles and AI engineering overlaps39:00 The role of semantics in data and AI workflows41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem
Managing security for a device that can autonomously interact with third-party services presents unique orchestration challenges that go beyond traditional IoT security models. In this episode of Detection at Scale, Matthew Domko, Head of Security at Rabbit, gives Jack an in-depth look at building security programs for AI-powered hardware at scale. He details how his team achieved 100% infrastructure-as-code coverage while maintaining the agility needed for rapid product iteration. Matt also challenges conventional approaches to scaling security operations, advocating for a serverless-first architecture that has fundamentally changed how they handle detection engineering. His insights on using private LLMs via Amazon Bedrock to analyze security events showcase a pragmatic approach to AI adoption, focusing on augmentation of existing workflows rather than wholesale replacement of human analysis. Topics discussed: How transitioning from reactive SIEM operations to a data-first security approach using AWS Lambda and SQS enabled Rabbit's team to handle complex orchestration monitoring without maintaining persistent infrastructure. The practical implementation of LLM-assisted detection engineering, using Amazon Bedrock to analyze 15-minute blocks of security telemetry across their stack. A deep dive into security data lake architecture decisions, including how their team addressed the challenge of cost attribution when security telemetry becomes valuable to other engineering teams. The evolution from traditional detection engineering to a "detection-as-code" pipeline that leverages infrastructure-as-code for security rules, enabling version control, peer review, and automated testing of detection logic while maintaining rapid deployment capabilities. Concrete examples of integrating security into the engineering workflow, including how they use LLMs to transform security tickets to match engineering team nomenclature and communication patterns. Technical details of their data ingestion architecture using AWS SQS and Lambda, showing how two well-documented core patterns enabled the team to rapidly onboard new data sources and detection capabilities without direct security team involvement. A pragmatic framework for evaluating where generative AI adds value in security operations, focusing on specific use cases like log analysis and detection engineering where the technology demonstrably improves existing workflows rather than attempting wholesale process automation. Listen to more episodes: Apple Spotify YouTube Website
182: In dieser Folge tauchen wir tief in das Thema Verkörperung (Embodiment) ein. Was genau bedeutet Verkörperung und warum ist es so wichtig, den Körper auf dem Weg der Heilung einzubeziehen? Besonders für Queens ist es essenziell, ihren Körper als wertvolle Ressource zu erkennen und zu nutzen. Wir sprechen darüber, welche einfachen Übungen du in deinen Alltag integrieren kannst, um deinen Körper bewusster wahrzunehmen und deine Heilung zu unterstützen. In dieser Episode stellen wir dir zwei praktische Übungen vor, die du direkt mitmachen kannst. Für weitere Übungen und tiefere Einblicke, schau gerne bei den SQS-Übungen vorbei. SQS-Übungen: https://survivorqueens.de/sqs-survivor-queen-skil/ ...
David Flanagan, founder of Rawdoke Academy, discusses why WebAssembly (WASM) could be the future of serverless technology and explores the evolution, benefits, and potential of WASM in transforming server-side applications across various environments. Links https://davidflanagan.com https://github.com/davidflanagan https://twitter.com/__DavidFlanagan https://www.linkedin.com/in/rawkode https://rawkode.academy https://youtube.com/@RawkodeAcademy https://www.hopp.bio/rawkode We want to hear from you! How did you find us? Did you see us on Twitter? In a newsletter? Or maybe we were recommended by a friend? Let us know by sending an email to our producer, Emily, at emily.kochanekketner@logrocket.com (mailto:emily.kochanekketner@logrocket.com), or tweet at us at PodRocketPod (https://twitter.com/PodRocketpod). Follow us. Get free stickers. Follow us on Apple Podcasts, fill out this form (https://podrocket.logrocket.com/get-podrocket-stickers), and we'll send you free PodRocket stickers! What does LogRocket do? LogRocket provides AI-first session replay and analytics that surfaces the UX and technical issues impacting user experiences. Start understand where your users are struggling by trying it for free at [LogRocket.com]. Try LogRocket for free today.(https://logrocket.com/signup/?pdr) Special Guest: David Flanagan.
181: In dieser Podcastfolge befassen wir uns mit dem Thema Trauma als Grenzverletzung und wie wichtig es ist, Sicherheit und gesunde Grenzen zu spüren. Warum haben viele von uns Probleme mit ihren Grenzen? Wir sprechen über die Extreme von zu vielen Grenzen (Mauern) und Grenzenlosigkeit und deren Ursachen. Ein weiterer Schwerpunkt ist, wie wir das Gefühl von Sicherheit im Hier und Jetzt wiederherstellen können. Wir erklären die Methode der Titration fürs Nervensystem und warum sie eine entscheidende Rolle in der Traumabewältigung spielt. Fragen, auf die wir in dieser Folge eingehen: - Warum haben viele Menschen Probleme mit ihren Grenzen? - Wie können wir das Gefühl von Sicherheit im Hier und Jetzt wiederherstellen? - Was ist Titration fürs Nervensystem und warum ist sie so wichtig? Hole dir die SQS Übungen: http://survivorqueens.de/sqs-survivor-queen-skills ...
180: In dieser Podcastfolge tauchen wir tief in das Thema der Orientierung als Ressource ein. Wie kann uns Orientierung helfen, den Parasympathikus zu aktivieren und so in einen Zustand der Ruhe und Entspannung zu gelangen? Wir sprechen über die Herausforderungen der Desorientierung bei Triggern und in Traumanähe, inklusive Dissoziation und ihrer Auswirkungen. Zudem stelle ich verschiedene Übungen vor, die dabei helfen können, Orientierung zu finden. Eine dieser Übungen leite ich direkt an, sodass du sie sofort ausprobieren kannst. Fragen, auf die wir in dieser Folge eingehen: - Wie unterstützt Orientierung die Aktivierung des Parasympathikus? - Welche Auswirkungen hat Desorientierung bei Triggern und Traumata? - Welche Übungen können helfen, mehr Orientierung im Alltag zu finden? Hole dir die SQS Übungen: http://survivorqueens.de/sqs-survivor-queen-skills ...
In this episode, we discuss best practices for working with AWS Lambda. We cover how Lambda functions work under the hood, including cold starts and warm starts. We then explore different invocation types - synchronous, asynchronous, and event-based. For each, we share tips on performance, cost optimization, and monitoring. Other topics include function structure, logging, instrumentation, and security. Throughout the episode, we aim to provide a solid mental model for serverless development and share our experiences to help you build efficient and robust Lambda applications.
In this episode, I spoke with Waldemar Hummer, founder and CTO of LocalStack. We discussed what's new in the latest version of LocalStack and highlighted some of the most interesting additions.One particular highlight for me is the ability to identify IAM permission errors between direct service integrations. For example, when an EventBridge pipe cannot deliver a message to a SQS target. And the ability to use test runs to generate the necessary IAM permissions so they can be added to your Lambda functions.LocalStack v3 also allows running chaos experiments locally by adding random latency spikes, making an entire AWS region unavailable, or simulating DynamoDB throughput-exceeded errors.Lots of exciting new features in LocalStack v3! Waldemar gave us a live demo of some of these features. You can watch the episode on YouTube and watch the demos here.Links from the episode:My webinar with Waldemar after LocalStack v2LocalStack v3 announcementMy blog post on when to use Step Functions vs running everything in LambdaLocalStack is hiring!Opening theme song:Cheery Monday by Kevin MacLeodLink: https://incompetech.filmmusic.io/song/3495-cheery-mondayLicense: http://creativecommons.org/licenses/by/4.0
Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Cliff Crosland about Scanner, a security data lake platform for analyzing security logs and identifying issues quickly and cost-effectively Interview Introduction How did you get involved in the area of data management? Can you describe what Scanner is and the story behind it? What were the shortcomings of other tools that are available in the ecosystem? What is Scanner explicitly not trying to solve for in the security space? (e.g. SIEM) A query engine is useless without data to analyze. What are the data acquisition paths/sources that you are designed to work with?- e.g. cloudtrail logs, app logs, etc. What are some of the other sources of signal for security monitoring that would be valuable to incorporate or integrate with through Scanner? Log data is notoriously messy, with no strictly defined format. How do you handle introspection and querying across loosely structured records that might span multiple sources and inconsistent labelling strategies? Can you describe the architecture of the Scanner platform? What were the motivating constraints that led you to your current implementation? How have the design and goals of the product changed since you first started working on it? Given the security oriented customer base that you are targeting, how do you address trust/network boundaries for compliance with regulatory/organizational policies? What are the personas of the end-users for Scanner? How has that influenced the way that you think about the query formats, APIs, user experience etc. for the prroduct? For teams who are working with Scanner can you describe how it fits into their workflow? What are the most interesting, innovative, or unexpected ways that you have seen Scanner used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Scanner? When is Scanner the wrong choice? What do you have planned for the future of Scanner? Contact Info LinkedIn (https://www.linkedin.com/in/cliftoncrosland/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Scanner (https://scanner.dev/) cURL (https://curl.se/) Rust (https://www.rust-lang.org/) Splunk (https://www.splunk.com/) S3 (https://aws.amazon.com/s3/) AWS Athena (https://aws.amazon.com/athena/) Loki (https://grafana.com/oss/loki/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Presto (https://prestodb.io/) Trino (thttps://trino.io/) AWS CloudTrail (https://aws.amazon.com/cloudtrail/) GitHub Audit Logs (https://docs.github.com/en/organizations/keeping-your-organization-secure/managing-security-settings-for-your-organization/reviewing-the-audit-log-for-your-organization) Okta (https://www.okta.com/) Cribl (https://cribl.io/) Vector.dev (https://vector.dev/) Tines (https://www.tines.com/) Torq (https://torq.io/) Jira (https://www.atlassian.com/software/jira) Linear (https://linear.app/) ECS Fargate (https://aws.amazon.com/fargate/) SQS (https://aws.amazon.com/sqs/) Monoid (https://en.wikipedia.org/wiki/Monoid) Group Theory (https://en.wikipedia.org/wiki/Group_theory) Avro (https://avro.apache.org/) Parquet (https://parquet.apache.org/) OCSF (https://github.com/ocsf/) VPC Flow Logs (https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
Amir Szekely, Owner at CloudSnorkel, joins Corey on Screaming in the Cloud to discuss how he got his start in the early days of cloud and his solo project, CloudSnorkel. Throughout this conversation, Corey and Amir discuss the importance of being pragmatic when moving to the cloud, and the different approaches they see in developers from the early days of cloud to now. Amir shares what motivates him to develop open-source projects, and why he finds fulfillment in fixing bugs and operating CloudSnorkel as a one-man show. About AmirAmir Szekely is a cloud consultant specializing in deployment automation, AWS CDK, CloudFormation, and CI/CD. His background includes security, virtualization, and Windows development. Amir enjoys creating open-source projects like cdk-github-runners, cdk-turbo-layers, and NSIS.Links Referenced: CloudSnorkel: https://cloudsnorkel.com/ lasttootinaws.com: https://lasttootinaws.com camelcamelcamel.com: https://camelcamelcamel.com github.com/cloudsnorkel: https://github.com/cloudsnorkel Personal website: https://kichik.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn, and this is an episode that I have been angling for for longer than you might imagine. My guest today is Amir Szekely, who's the owner at CloudSnorkel. Amir, thank you for joining me.Amir: Thanks for having me, Corey. I love being here.Corey: So, I've been using one of your open-source projects for an embarrassingly long amount of time, and for the longest time, I make the critical mistake of referring to the project itself as CloudSnorkel because that's the word that shows up in the GitHub project that I can actually see that jumps out at me. The actual name of the project within your org is cdk-github-runners if I'm not mistaken.Amir: That's real original, right?Corey: Exactly. It's like, “Oh, good, I'll just mention that, and suddenly everyone will know what I'm talking about.” But ignoring the problems of naming things well, which is a pain that everyone at AWS or who uses it knows far too well, the product is basically magic. Before I wind up basically embarrassing myself by doing a poor job of explaining what it is, how do you think about it?Amir: Well, I mean, it's a pretty simple project, which I think what makes it great as well. It creates GitHub runners with CDK. That's about it. It's in the name, and it just does that. And I really tried to make it as simple as possible and kind of learn from other projects that I've seen that are similar, and basically learn from my pain points in them.I think the reason I started is because I actually deployed CDK runners—sorry, GitHub runners—for one company, and I ended up using the Kubernetes one, right? So, GitHub in themselves, they have two projects they recommend—and not to nudge GitHub, please recommend my project one day as well—they have the Kubernetes controller and they have the Terraform deployer. And the specific client that I worked for, they wanted to use Kubernetes. And I tried to deploy it, and, Corey, I swear, I worked three days; three days to deploy the thing, which was crazy to me. And every single step of the way, I had to go and read some documentation, figure out what I did wrong, and apparently the order the documentation was was incorrect.And I had to—I even opened tickets, and they—you know, they were rightfully like, “It's open-source project. Please contribute and fix the documentation for us.” At that point, I said, “Nah.” [laugh]. Let me create something better with CDK and I decided just to have the simplest setup possible.So usually, right, what you end up doing in these projects, you have to set up either secrets or SSM parameters, and you have to prepare the ground and you have to get your GitHub token and all those things. And that's just annoying. So, I decided to create a—Corey: So much busy work.Amir: Yes, yeah, so much busy work and so much boilerplate and so much figuring out the right way and the right order, and just annoying. So, I decided to create a setup page. I thought, “What if you can actually install it just like you install any app on GitHub,” which is the way it's supposed to be right? So, when you install cdk-github-runners—CloudSnorkel—you get an HTML page and you just click a few buttons and you tell it where to install it and it just installs it for you. And it sets the secrets and everything. And if you want to change the secret, you don't have to redeploy. You can just change the secret, right? You have to roll the token over or whatever. So, it's much, much easier to install.Corey: And I feel like I discovered this project through one of the more surreal approaches—and I had cause to revisit it a few weeks ago when I was redoing my talk for the CDK Community Day, which has since happened and people liked the talk—and I mentioned what CloudSnorkel had been doing and how I was using the runners accordingly. So, that was what I accidentally caused me to pop back up with, “Hey, I've got some issues here.” But we'll get to that. Because once upon a time, I built a Twitter client for creating threads because shitposting is my love language, I would sit and create Twitter threads in the middle of live keynote talks. Threading in the native client was always terrible, and I wanted to build something that would help me do that. So, I did.And it was up for a while. It's not anymore because I'm not paying $42,000 a month in API costs to some jackass, but it still exists in the form of lasttootinaws.com if you want to create threads on Mastodon. But after I put this out, some people complained that it was slow.To which my response was, “What do you mean? It's super fast for me in San Francisco talking to it hosted in Oregon.” But on every round trip from halfway around the world, it became a problem. So, I got it into my head that since this thing was fully stateless, other than a Lambda function being fronted via an API Gateway, that I should deploy it to every region. It didn't quite fit into a Cloudflare Worker or into one of the Edge Lambda functions that AWS has given up on, but okay, how do I deploy something to every region?And the answer is, with great difficulty because it's clear that no one was ever imagining with all those regions that anyone would use all of them. It's imagined that most customers use two or three, but customers are different, so which two or three is going to be widely varied. So, anything halfway sensible about doing deployments like this didn't work out. Again, because this thing was also a Lambda function and an API Gateway, it was dirt cheap, so I didn't really want to start spending stupid amounts of money doing deployment infrastructure and the rest.So okay, how do I do this? Well, GitHub Actions is awesome. It is basically what all of AWS's code offerings wish that they were. CodeBuild is sad and this was kind of great. The problem is, once you're out of the free tier, and if you're a bad developer where you do a deploy on every iteration, suddenly it starts costing for what I was doing in every region, something like a quarter of per deploy, which adds up when you're really, really bad at programming.Amir: [laugh].Corey: So, their matrix jobs are awesome, but I wanted to do some self-hosted runners. How do I do that? And I want to keep it cheap, so how do I do a self-hosted runner inside of a Lambda function? Which led me directly to you. And it was nothing short of astonishing. This was a few years ago. I seem to recall that it used to be a bit less well-architected in terms of its elegance. Did it always use step functions, for example, to wind up orchestrating these things?Amir: Yeah, so I do remember that day. We met pretty much… basically as a joke because the Lambda Runner was a joke that I did, and I posted on Twitter, and I was half-proud of my joke that starts in ten seconds, right? But yeah, no, the—I think it always used functions. I've been kind of in love with the functions for the past two years. They just—they're nice.Corey: Oh, they're magic, and AWS is so bad at telling their story. Both of those things are true.Amir: Yeah. And the API is not amazing. But like, when you get it working—and you know, you have to spend some time to get it working—it's really nice because then you have nothing to manage, ever. And they can call APIs directly now, so you don't have to even create Lambdas. It's pretty cool.Corey: And what I loved is you wind up deploying this thing to whatever account you want it to live within. What is it, the OIDC? I always get those letters in the wrong direction. OIDC, I think, is correct.Amir: I think it's OIDC, yeah.Corey: Yeah, and it winds up doing this through a secure method as opposed to just okay, now anyone with access to the project can deploy into your account, which is not ideal. And it just works. It spins up a whole bunch of these Lambda functions that are using a Docker image as the deployment environment. And yeah, all right, if effectively my CDK deploy—which is what it's doing inside of this thing—doesn't complete within 15 minutes, then it's not going to and the thing is going to break out. We've solved the halting problem. After 15 minutes, the loop will terminate. The end.But that's never been a problem, even with getting ACM certificates spun up. It completes well within that time limit. And its cost to me is effectively nothing. With one key exception: that you made the choice to use Secrets Manager to wind up storing a lot of the things it cares about instead of Parameter Store, so I think you wind up costing me—I think there's two of those different secrets, so that's 80 cents a month. Which I will be demanding in blood one of these days if I ever catch you at re:Invent.Amir: I'll buy you beer [laugh].Corey: There we go. That'll count. That'll buy, like, several months of that. That works—at re:Invent, no. The beers there are, like, $18, so that'll cover me for years. We're set.Amir: We'll split it [laugh].Corey: Exactly. Problem solved. But I like the elegance of it, I like how clever it is, and I want to be very clear, though, it's not just for shitposting. Because it's very configurable where, yes, you can use Lambda functions, you can use Spot Instances, you can use CodeBuild containers, you can use Fargate containers, you can use EC2 instances, and it just automatically orchestrates and adds these self-hosted runners to your account, and every build gets a pristine environment as a result. That is no small thing.Amir: Oh, and I love making things configurable. People really appreciate it I feel, you know, and gives people kind of a sense of power. But as long as you make that configuration simple enough, right, or at least the defaults good defaults, right, then, even with that power, people still don't shoot themselves in the foot and it still works really well. By the way, we just added ECS recently, which people really were asking for because it gives you the, kind of, easy option to have the runner—well, not the runner but at least the runner infrastructure staying up, right? So, you can have auto-scaling group backing ECS and then the runner can start up a lot faster. It was actually very important to other people because Lambda, as fast that it is, it's limited, and Fargate, for whatever reason, still to this day, takes a minute to start up.Corey: Yeah. What's wild to me about this is, start to finish, I hit a deploy to the main branch and it sparks the thing up, runs the deploy. Deploy itself takes a little over two minutes. And every time I do this, within three minutes of me pushing to commit, the deploy is done globally. It is lightning fast.And I know it's easy to lose yourself in the idea of this being a giant shitpost, where, oh, who's going to do deployment jobs in Lambda functions? Well, kind of a lot of us for a variety of reasons, some of which might be better than others. In my case, it was just because I was cheap, but the massive parallelization ability to do 20 simultaneous deploys in a matrix configuration that doesn't wind up smacking into rate limits everywhere, that was kind of great.Amir: Yeah, we have seen people use Lambda a lot. It's mostly for, yeah, like you said, small jobs. And the environment that they give you, it's kind of limited, so you can't actually install packages, right? There is no sudo, and you can't actually install anything unless it's in your temp directory. But still, like, just being able to run a lot of little jobs, it's really great. Yeah.Corey: And you can also make sure that there's a Docker image ready to go with the stuff that you need, just by configuring how the build works in the CDK. I will admit, I did have a couple of bug reports for you. One was kind of useful, where it was not at all clear how to do this on top of a Graviton-based Lambda function—because yeah, that was back when not everything really supported ARM architectures super well—and a couple of other times when the documentation was fairly ambiguous from my perspective, where it wasn't at all clear, what was I doing? I spent four hours trying to beat my way through it, I give up, filed an issue, went to get a cup of coffee, came back, and the answer was sitting there waiting for me because I'm not convinced you sleep.Amir: Well, I am a vampire. My last name is from the Transylvania area [laugh]. So—Corey: Excellent. Excellent.Amir: By the way, not the first time people tell me that. But anyway [laugh].Corey: There's something to be said for getting immediate responsiveness because one of the reasons I'm always so loath to go and do a support ticket anywhere is this is going to take weeks. And then someone's going to come back with a, “I don't get it.” And try and, like, read the support portfolio to you. No, you went right into yeah, it's this. Fix it and your problem goes away. And sure enough, it did.Amir: The escalation process that some companies put you through is very frustrating. I mean, lucky for you, CloudSnorkel is a one-man show and this man loves solving bugs. So [laugh].Corey: Yeah. Do you know of anyone using it for anything that isn't ridiculous and trivial like what I'm using it for?Amir: Yeah, I have to think whether or not I can… I mean, so—okay. We have a bunch of dedicated users, right, the GitHub repo, that keep posting bugs and keep posting even patches, right, so you can tell that they're using it. I even have one sponsor, one recurring sponsor on GitHub that uses it.Corey: It's always nice when people thank you via money.Amir: Yeah. Yeah, it is very validating. I think [BLEEP] is using it, but I also don't think I can actually say it because I got it from the GitHub.Corey: It's always fun. That's the beautiful part about open-source. You don't know who's using this. You see what other things people are working on, and you never know, is one of their—is this someone's side project, is it a skunkworks thing, or God forbid, is this inside of every car going forward and no one bothered to tell me about that. That is the magic and mystery of open-source. And you've been doing open-source for longer than I have and I thought I was old. You were originally named in some of the WinAMP credits, for God's sake, that media player that really whipped the llama's ass.Amir: Oh, yeah, I started real early. I started about when I was 15, I think. I started off with Pascal or something or even Perl, and then I decided I have to learn C and I have to learn Windows API. I don't know what possessed me to do that. Win32 API is… unique [laugh].But once I created those applications for myself, right, I think there was—oh my God, do you know the—what is it called, Sherlock in macOS, right? And these days, for PowerToys, there is the equivalent of it called, I don't know, whatever that—PowerBar? That's exactly—that was that. That's a project I created as a kid. I wanted something where I can go to the Run menu of Windows when you hit Winkey R, and you can just type something and it will start it up, right?I didn't want to go to the Start menu and browse and click things. I wanted to do everything with the keyboard. So, I created something called Blazerun [laugh], which [laugh] helped you really easily create shortcuts that went into your path, right, the Windows path, so you can really easily start them from Winkey R. I don't think that anyone besides me used it, but anyway, that thing needed an installer, right? Because Windows, you got to install things. So, I ended up—Corey: Yeah, these days on Mac OS, I use Alfred for that which is kind of long in the tooth, but there's a launch bar and a bunch of other stuff for it. What I love is that if I—I can double-tap the command key and that just pops up whatever I need it to and tell the computer what to do. It feels like there's an AI play in there somewhere if people can figure out how to spend ten minutes on building AI that does something other than lets them fire their customer service staff.Amir: Oh, my God. Please don't fire customer service staff. AI is so bad.Corey: Yeah, when I reach out to talk to a human, I really needed a human.Amir: Yes. Like, I'm not calling you because I want to talk to a robot. I know there's a website. Leave me alone, just give me a person.Corey: Yeah. Like, you already failed to solve my problem on your website. It's person time.Amir: Exactly. Oh, my God. Anyway [laugh]. So, I had to create an installer, right, and I found it was called NSIS. So, it was a Nullsoft “SuperPiMP” installation system. Or in the future, when Justin, the guy who created Winamp and NSIS, tried to tone down a little bit, Nullsoft Scriptable Installation System. And SuperPiMP is—this is such useless history for you, right, but SuperPiMP is the next generation of PiMP which is Plug-in Mini Packager [laugh].Corey: I remember so many of the—like, these days, no one would ever name any project like that, just because it's so off-putting to people with sensibilities, but back then that was half the stuff that came out. “Oh, you don't like how this thing I built for free in the wee hours when I wasn't working at my fast food job wound up—you know, like, how I chose to name it, well, that's okay. Don't use it. Go build your own. Oh, what you're using it anyway. That's what I thought.”Amir: Yeah. The source code was filled with profanity, too. And like, I didn't care, I really did not care, but some people would complain and open bug reports and patches. And my policy was kind of like, okay if you're complaining, I'm just going to ignore you. If you're opening a patch, fine, I'm going to accept that you're—you guys want to create something that's sensible for everybody, sure.I mean, it's just source code, you know? Whatever. So yeah, I started working on that NSIS. I used it for myself and I joined the forums—and this kind of answers to your question of why I respond to things so fast, just because of the fun—I did the same when I was 15, right? I started going on the forums, you remember forums? You remember that [laugh]?Corey: Oh, yeah, back before they all became terrible and monetized.Amir: Oh, yeah. So, you know, people were using NSIS, too, and they had requests, right? They wanted. Back in the day—what was it—there was only support for 16-bit colors for the icon, so they want 32-bit colors and big colors—32—big icon, sorry, 32 pixels by 32 pixels. Remember, 32 pixels?Corey: Oh, yes. Not well, and not happily, but I remember it.Amir: Yeah. So, I started just, you know, giving people—working on that open-source and creating up a fork. It wasn't even called ‘fork' back then, but yeah, I created, like, a little fork of myself and I started adding all these features. And people were really happy, and kind of created, like, this happy cycle for myself: when people were happy, I was happy coding. And then people were happy by what I was coding. And then they were asking for more and they were getting happier, the more I responded.So, it was kind of like a serotonin cycle that made me happy and made everybody happy. So, it's like a win, win, win, win, win. And that's how I started with open-source. And eventually… NSIS—again, that installation system—got so big, like, my fork got so big, and Justin, the guy who works on WinAMP and NSIS, he had other things to deal with. You know, there's a whole history there with AOL. I'm sure you've heard all the funny stories.Corey: Oh, yes. In fact, one thing that—you want to talk about weird collisions of things crossing, one of the things I picked up from your bio when you finally got tired of telling me no and agreed to be on the show was that you're also one of the team who works on camelcamelcamel.com. And I keep forgetting that's one of those things that most people have no idea exists. But it's very simple: all it does is it tracks Amazon products that you tell it to and alerts you when there's a price drop on the thing that you're looking at.It's something that is useful. I try and use it for things of substance or hobbies because I feel really pathetic when I'm like, get excited emails about a price drop in toilet paper. But you know, it's very handy just to keep an idea for price history, where okay, am I actually being ripped off? Oh, they claim it's their big Amazon Deals day and this is 40% off. Let's see what camelcamelcamel has to say.Oh, surprise. They just jacked the price right beforehand and now knocked 40% off. Genius. I love that. It always felt like something that was going to be blown off the radar by Amazon being displeased, but I discovered you folks in 2010 and here you are now, 13 years later, still here. I will say the website looks a lot better now.Amir: [laugh]. That's a recent change. I actually joined camel, maybe two or three years ago. I wasn't there from the beginning. But I knew the guy who created it—again, as you were saying—from the Winamp days, right? So, we were both working in the free—well, it wasn't freenode. It was not freenode. It was a separate IRC server that, again, Justin created for himself. It was called landoleet.Corey: Mmm. I never encountered that one.Amir: Yeah, no, it was pretty private. The only people that cared about WinAMP and NSIS ended up joining there. But it was a lot of fun. I met a lot of friends there. And yeah, I met Daniel Green there as well, and he's the guy that created, along with some other people in there that I think want to remain anonymous so I'm not going to mention, but they also were on the camel project.And yeah, I was kind of doing my poor version of shitposting on Twitter about AWS, kind of starting to get some traction and maybe some clients and talk about AWS so people can approach me, and Daniel approached me out of the blue and he was like, “Do you just post about AWS on Twitter or do you also do some AWS work?” I was like, “I do some AWS work.”Corey: Yes, as do all of us. It's one of those, well crap, we're getting called out now. “Do you actually know how any of this stuff works?” Like, “Much to my everlasting shame, yes. Why are you asking?”Amir: Oh, my God, no, I cannot fix your printer. Leave me alone.Corey: Mm-hm.Amir: I don't want to fix your Lambdas. No, but I do actually want to fix your Lambdas. And so, [laugh] he approached me and he asked if I can help them move camelcamelcamel from their data center to AWS. So, that was a nice big project. So, we moved, actually, all of camelcamelcamel into AWS. And this is how I found myself not only in the Winamp credits, but also in the camelcamelcamel credits page, which has a great picture of me riding a camel.Corey: Excellent. But one of the things I've always found has been that when you take an application that has been pre-existing for a while in a data center and then move it into the cloud, you suddenly have to care about things that no one sensible pays any attention to in the land of the data center. Because it's like, “What do I care about how much data passes between my application server and the database? Wait, what do you mean that in this configuration, that's a chargeable data transfer? Oh, dear Lord.” And things that you've never had to think about optimizing are suddenly things are very much optimizing.Because let's face it, when it comes to putting things in racks and then running servers, you aren't auto-scaling those things, so everything tends to be running over-provisioned, for very good reasons. It's an interesting education. Anything you picked out from that process that you think it'd be useful for folks to bear in mind if they're staring down the barrel of the same thing?Amir: Yeah, for sure. I think… in general, right, not just here. But in general, you always want to be pragmatic, right? You don't want to take steps are huge, right? So, the thing we did was not necessarily rewrite everything and change everything to AWS and move everything to Lambda and move everything to Docker.Basically, we did a mini lift-and-shift, but not exactly lift-and-shift, right? We didn't take it as is. We moved to RDS, we moved to ElastiCache, right, we obviously made use of security groups and session connect and we dropped SSH Sage and we improved the security a lot and we locked everything down, all the permissions and all that kind of stuff, right? But like you said, there's stuff that you start having to pay attention to. In our case, it was less the data transfer because we have a pretty good CDN. There was more of IOPS. So—and IOPS, specifically for a database.We had a huge database with about one terabyte of data and a lot of it is that price history that you see, right? So, all those nice little graphs that we create in—what do you call them, charts—that we create in camelcamelcamel off the price history. There's a lot of data behind that. And what we always want to do is actually remove that from MySQL, which has been kind of struggling with it even before the move to AWS, but after the move to AWS, where everything was no longer over-provisioned and we couldn't just buy a few more NVMes on Amazon for 100 bucks when they were on sale—back when we had to pay Amazon—Corey: And you know, when they're on sale. That's the best part.Amir: And we know [laugh]. We get good prices on NVMe. But yeah, on Amazon—on AWS, sorry—you have to pay for io1 or something, and that adds up real quick, as you were saying. So, part of that move was also to move to something that was a little better for that data structure. And we actually removed just that data, the price history, the price points from MySQL to DynamoDB, which was a pretty nice little project.Actually, I wrote about it in my blog. There is, kind of, lessons learned from moving one terabyte from MySQL to DynamoDB, and I think the biggest lesson was about hidden price of storage in DynamoDB. But before that, I want to talk about what you asked, which was the way that other people should make that move, right? So again, be pragmatic, right? If you Google, “How do I move stuff from DynamoDB to MySQL,” everybody's always talking about their cool project using Lambda and how you throttle Lambda and how you get throttled from DynamoDB and how you set it up with an SQS, and this and that. You don't need all that.Just fire up an EC2 instance, write some quick code to do it. I used, I think it was Go with some limiter code from Uber, and that was it. And you don't need all those Lambdas and SQS and the complication. That thing was a one-time thing anyway, so it doesn't need to be super… super-duper serverless, you know?Corey: That is almost always the way that it tends to play out. You encounter these weird little things along the way. And you see so many things that are tied to this is how architecture absolutely must be done. And oh you're not a real serverless person if you don't have everything running in Lambda and the rest. There are times where yeah, spin up an EC2 box, write some relatively inefficient code in ten minutes and just do the thing, and then turn it off when you're done. Problem solved. But there's such an aversion to that. It's nice to encounter people who are pragmatists more than they are zealots.Amir: I mostly learned that lesson. And both Daniel Green and me learned that lesson from the Winamp days. Because we both have written plugins for Winamp and we've been around that area and you can… if you took one of those non-pragmatist people, right, and you had them review the Winamp code right now—or even before—they would have a million things to say. That code was—and NSIS, too, by the way—and it was so optimized. It was so not necessarily readable, right? But it worked and it worked amazing. And Justin would—if you think I respond quickly, right, Justin Frankel, the guy who wrote Winamp, he would release versions of NSIS and of Winamp, like, four versions a day, right? That was before [laugh] you had CI/CD systems and GitHub and stuff. That was just CVS. You remember CVS [laugh]?Corey: Oh, I've done multiple CVS migrations. One to Git and a couple to Subversion.Amir: Oh yeah, Subversion. Yep. Done ‘em all. CVS to Subversion to Git. Yep. Yep. That was fun.Corey: And these days, everyone's using Git because it—we're beginning to have a monoculture.Amir: Yeah, yeah. I mean, but Git is nicer than Subversion, for me, at least. I've had more fun with it.Corey: Talk about damning with faint praise.Amir: Faint?Corey: Yeah, anything's better than Subversion, let's be honest here.Amir: Oh [laugh].Corey: I mean, realistically, copying a bunch of files and directories to a.bak folder is better than Subversion.Amir: Well—Corey: At least these days. But back then it was great.Amir: Yeah, I mean, the only thing you had, right [laugh]?Corey: [laugh].Amir: Anyway, achieving great things with not necessarily the right tools, but just sheer power of will, that's what I took from the Winamp days. Just the entire world used Winamp. And by the way, the NSIS project that I was working on, right, I always used to joke that every computer in the world ran my code, every Windows computer in the world when my code, just because—Corey: Yes.Amir: So, many different companies use NSIS. And none of them cared that the code was not very readable, to put it mildly.Corey: So, many companies founder on those shores where they lose sight of the fact that I can point to basically no companies that died because their code was terrible, yeah, had an awful lot that died with great-looking code, but they didn't nail the business problem.Amir: Yeah. I would be lying if I said that I nailed exactly the business problem at NSIS because the most of the time I would spend there and actually shrinking the stub, right, there was appended to your installer data, right? So, there's a little stub that came—the executable, basically, that came before your data that was extracted. I spent, I want to say, years of my life [laugh] just shrinking it down by bytes—by literal bytes—just so it stays under 34, 35 kilobytes. It was kind of a—it was a challenge and something that people appreciated, but not necessarily the thing that people appreciate the most. I think the features—Corey: Well, no I have to do the same thing to make sure something fits into a Lambda deployment package. The scale changes, the problem changes, but somehow everything sort of rhymes with history.Amir: Oh, yeah. I hope you don't have to disassemble code to do that, though because that's uh… I mean, it was fun. It was just a lot.Corey: I have to ask, how much work went into building your cdk-github-runners as far as getting it to a point of just working out the door? Because I look at that and it feels like there's—like, the early versions, yeah, there wasn't a whole bunch of code tied to it, but geez, the iterative, “How exactly does this ridiculous step functions API work or whatnot,” feels like I'm looking at weeks of frustration. At least it would have been for me.Amir: Yeah, yeah. I mean, it wasn't, like, a day or two. It was definitely not—but it was not years, either. I've been working on it I think about a year now. Don't quote me on that. But I've put a lot of time into it. So, you know, like you said, the skeleton code is pretty simple: it's a step function, which as we said, takes a long time to get right. The functions, they are really nice, but their definition language is not very straightforward. But beyond that, right, once that part worked, it worked. Then came all the bug reports and all the little corner cases, right? We—Corey: Hell is other people's use cases. Always is. But that's honestly better than a lot of folks wind up experiencing where they'll put an open-source project up and no one ever knows. So, getting users is often one of the biggest barriers to a lot of this stuff. I've found countless hidden gems lurking around on GitHub with a very particular search for something that no one had ever looked at before, as best I can tell.Amir: Yeah.Corey: Open-source is a tricky thing. There needs to be marketing brought into it, there needs to be storytelling around it, and has to actually—dare I say—solve a problem someone has.Amir: I mean, I have many open-source projects like that, that I find super useful, I created for myself, but no one knows. I think cdk-github-runners, I'm pretty sure people know about it only because you talked about it on Screaming in the Cloud or your newsletter. And by the way, thank you for telling me that you talked about it last week in the conference because now we know why there was a spike [laugh] all of a sudden. People Googled it.Corey: Yeah. I put links to it as well, but it's the, yeah, I use this a lot and it's great. I gave a crappy explanation on how it works, but that's the trick I've found between conference talks and, dare I say, podcast episodes, you gives people a glimpse and a hook and tell them where to go to learn more. Otherwise, you're trying to explain every nuance and every intricacy in 45 minutes. And you can't do that effectively in almost every case. All you're going to do is drive people away. Make it sound exciting, get them to see the value in it, and then let them go.Amir: You have to explain the market for it, right? That's it.Corey: Precisely.Amir: And I got to say, I somewhat disagree with your—or I have a different view when you say that, you know, open-source projects needs marketing and all those things. It depends on what open-source is for you, right? I don't create open-source projects so they are successful, right? It's obviously always nicer when they're successful, but—and I do get that cycle of happiness that, like I was saying, people create bugs and I have to fix them and stuff, right? But not every open-source project needs to be a success. Sometimes it's just fun.Corey: No. When I talk about marketing, I'm talking about exactly what we're doing here. I'm not talking take out an AdWords campaign or something horrifying like that. It's you build something that solved the problem for someone. The big problem that worries me about these things is how do you not lose sleep at night about the fact that solve someone's problem and they don't know that it exists?Because that drives me nuts. I've lost count of the number of times I've been beating my head against a wall and asked someone like, “How would you handle this?” Like, “Oh, well, what's wrong with this project?” “What do you mean?” “Well, this project seems to do exactly what you want it to do.” And no one has it all stuffed in their head. But yeah, then it seems like open-source becomes a little more corporatized and it becomes a lead gen tool for people to wind up selling their SaaS services or managed offerings or the rest.Amir: Yeah.Corey: And that feels like the increasing corporatization of open-source that I'm not a huge fan of.Amir: Yeah. I mean, I'm not going to lie, right? Like, part of why I created this—or I don't know if it was part of it, but like, I had a dream that, you know, I'm going to get, oh, tons of GitHub sponsors, and everybody's going to use it and I can retire on an island and just make money out of this, right? Like, that's always a dream, right? But it's a dream, you know?And I think bottom line open-source is… just a tool, and some people use it for, like you were saying, driving sales into their SaaS, some people, like, may use it just for fun, and some people use it for other things. Or some people use it for politics, even, right? There's a lot of politics around open-source.I got to tell you a story. Back in the NSIS days, right—talking about politics—so this is not even about politics of open-source. People made NSIS a battleground for their politics. We would have translations, right? People could upload their translations. And I, you know, or other people that worked on NSIS, right, we don't speak every language of the world, so there's only so much we can do about figuring out if it's a real translation, if it's good or not.Back in the day, Google Translate didn't exist. Like, these days, we check Google Translate, we kind of ask a few questions to make sure they make sense. But back in the day, we did the best that we could. At some point, we got a patch for Catalan language, I'm probably mispronouncing it—but the separatist people in Spain, I think, and I didn't know anything about that. I was a young kid and… I just didn't know.And I just included it, you know? Someone submitted a patch, they worked hard, they wanted to be part of the open-source project. Why not? Sure I included it. And then a few weeks later, someone from Spain wanted to change Catalan into Spanish to make sure that doesn't exist for whatever reason.And then they just started fighting with each other and started making demands of me. Like, you have to do this, you have to do that, you have to delete that, you have to change the name. And I was just so baffled by why would someone fight so much over a translation of an open-source project. Like, these days, I kind of get what they were getting at, right?Corey: But they were so bad at telling that story that it was just like, so basically, screw, “You for helping,” is how it comes across.Amir: Yeah, screw you for helping. You're a pawn now. Just—you're a pawn unwittingly. Just do what I say and help me in my political cause. I ended up just telling both of them if you guys can agree on anything, I'm just going to remove both translations. And that's what I ended up doing. I just removed both translations. And then a few months later—because we had a release every month basically, I just added both of them back and I've never heard from them again. So sort of problem solved. Peace the Middle East? I don't know.Corey: It's kind of wild just to see how often that sort of thing tends to happen. It's a, I don't necessarily understand why folks are so opposed to other people trying to help. I think they feel like there's this loss of control as things are slipping through their fingers, but it's a really unwelcoming approach. One of the things that got me deep into the open-source ecosystem surprisingly late in my development was when I started pitching in on the SaltStack project right after it was founded, where suddenly everything I threw their way was merged, and then Tom Hatch, the guy who founded the project, would immediately fix all the bugs and stuff I put in and then push something else immediately thereafter. But it was such a welcoming thing.Instead of nitpicking me to death in the pull request, it just got merged in and then silently fixed. And I thought that was a classy way to do it. Of course, it doesn't scale and of course, it causes other problems, but I envy the simplicity of those days and just the ethos behind that.Amir: That's something I've learned the last few years, I would say. Back in the NSIS day, I was not like that. I nitpicked. I nitpicked a lot. And I can guess why, but it just—you create a patch—in my mind, right, like you create a patch, you fix it, right?But these days I get, I've been on the other side as well, right? Like I created patches for open-source projects and I've seen them just wither away and die, and then five years later, someone's like, “Oh, can you fix this line to have one instead of two, and then I'll merge it.” I'm like, “I don't care anymore. It was five years ago. I don't work there anymore. I don't need it. If you want it, do it.”So, I get it these days. And these days, if someone creates a patch—just yesterday, someone created a patch to format cdk-github-runners in VS Code. And they did it just, like, a little bit wrong. So, I just fixed it for them and I approved it and pushed it. You know, it's much better. You don't need to bug people for most of it.Corey: You didn't yell at them for having the temerity to contribute?Amir: My voice is so raw because I've been yelling for five days at them, yeah.Corey: Exactly, exactly. I really want to thank you for taking the time to chat with me about how all this stuff came to be and your own path. If people want to learn more, where's the best place for them to find you?Amir: So, I really appreciate you having me and driving all this traffic to my projects. If people want to learn more, they can always go to cloudsnorkel.com; it has all the projects. github.com/cloudsnorkel has a few more. And then my private blog is kichik.com. So, K-I-C-H-I-K dot com. I don't post there as much as I should, but it has some interesting AWS projects from the past few years that I've done.Corey: And we will, of course, put links to all of that in the show notes. Thank you so much for taking the time. I really appreciate it.Amir: Thank you, Corey. It was really nice meeting you.Corey: Amir Szekely, owner of CloudSnorkel. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment. Heck, put it on all of the podcast platforms with a step function state machine that you somehow can't quite figure out how the API works.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Victoria is joined by guest co-host Joe Ferris, CTO at thoughtbot, and Seif Lotfy, the CTO and Co-Founder of Axiom. Seif discusses the journey, challenges, and strategies behind his data analytics and observability platform. Seif, who has a background in robotics and was a 2008 Sony AIBO robotic soccer world champion, shares that Axiom pivoted from being a Datadog competitor to focusing on logs and event data. The company even built its own logs database to provide a cost-effective solution for large-scale analytics. Seif is driven by his passion for his team and the invaluable feedback from the community, emphasizing that sales validate the effectiveness of a product. The conversation also delves into Axiom's shift in focus towards developers to address their need for better and more affordable observability tools. On the business front, Seif reveals the company's challenges in scaling across multiple domains without compromising its core offerings. He discusses the importance of internal values like moving with urgency and high velocity to guide the company's future. Furthermore, he touches on the challenges and strategies of open-sourcing projects and advises avoiding platforms like Reddit and Hacker News to maintain focus. Axiom (https://axiom.co/) Follow Axiom on LinkedIn (https://www.linkedin.com/company/axiomhq/), X (https://twitter.com/AxiomFM), GitHub (https://github.com/axiomhq), or Discord (https://discord.com/invite/axiom-co). Follow Seif Lotfy on LinkedIn (https://www.linkedin.com/in/seiflotfy/) or X (https://twitter.com/seiflotfy). Visit his website at seif.codes (https://seif.codes/). Follow thoughtbot on X (https://twitter.com/thoughtbot) or LinkedIn (https://www.linkedin.com/company/150727/). Become a Sponsor (https://thoughtbot.com/sponsorship) of Giant Robots! Transcript: VICTORIA: This is the Giant Robots Smashing Into Other Giant Robots Podcast, where we explore the design, development, and business of great products. I'm your host, Victoria Guido, and with me today is Seif Lotfy, CTO and Co-Founder of Axiom, the best home for your event data. Seif, thank you for joining me. SEIF: Hey, everybody. Thanks for having me. This is awesome. I love the name of the podcast, given that I used to compete in robotics. VICTORIA: What? All right, we're going to have to talk about that. And I also want to introduce a guest co-host today. Since we're talking about cloud, and observability, and data, I invited Joe Ferris, thoughtbot CTO and Director of Development of our platform engineering team, Mission Control. Welcome, Joe. How are you? JOE: Good, thanks. Good to be back again. VICTORIA: Okay. I am excited to talk to you all about observability. But I need to go back to Seif's comment on competing with robots. Can you tell me a little bit more about what robots you've built in the past? SEIF: I didn't build robots; I used to program them. Remember the Sony AIBOs, where Sony made these dog robots? And we would make them compete. There was an international competition where we made them play soccer, and they had to be completely autonomous. They only communicate via Bluetooth or via wireless protocols. And you only have the camera as your sensor as well as...a chest sensor throws the ball near you, and then yeah, you make them play football against each other, four versus four with a goalkeeper and everything. Just look it up: RoboCup AIBO. Look it up on YouTube. And I...2008 world champion with the German team. VICTORIA: That sounds incredible. What kind of crowds are you drawing out for a robot soccer match? Is that a lot of people involved with that? SEIF: You would be surprised how big the RoboCup competition is. It's ridiculous. VICTORIA: I want to go. I'm ready. I want to, like, I'll look it up and find out when the next one is. SEIF: No more Sony robots but other robots. Now, there's two-legged robots. So, they make them play as two-legged robots, much slower than four-legged robots, but works. VICTORIA: Wait. So, the robots you were playing soccer with had four legs they were running around on? SEIF: Yeah, they were dogs [laughter]. VICTORIA: That's awesome. SEIF: We all get the same robot. It's just a competition on software, right? On a software level. And some other competitions within the RoboCup actually use...you build your own robot and stuff like that. But this one was...it's called the Standard League, where we all have a robot, and we have to program it. JOE: And the standard robot was a dog. SEIF: Yeah, I think back then...we're talking...it's been a long time. I think it started in 2001 or something. I think the competition started in 2001 or 2002. And I compete from 2006 to 2008. Robots back then were just, you know, simple. VICTORIA: Robots today are way too complicated [laughs]. SEIF: Even AI is more complicated. VICTORIA: That's right. Yeah, everything has gotten a lot more complicated [laughs]. I'm so curious how you went from being a world-champion robot dog soccer player [laughs] programmer [laughs] to where you are today with Axiom. Can you tell me a little bit more about your journey? SEIF: The journey is interesting because it came from open source. I used to do open source on the side a lot–part of the GNOME Project. That's where I met Neil and the rest of my team, Mikkel Kamstrup, the whole crowd, basically. We worked on GNOME. We worked on Ubuntu. Like, most of them were working professionally on it. I was working for another company, but we worked on the same project. We ended up at Xamarin, which was bought by Microsoft. And then we ended up doing Axiom. But we've been around each other professionally since 2009, most of us. It's like a little family. But how we ended up exactly in observability, I think it's just trying to fix pain points in my life. VICTORIA: Yeah, I was reading through the docs on Axiom. And there's an interesting point you make about organizations having to choose between how much data they have and how much they want to spend on it. So, maybe you can tell me a little bit more about that pain point and what you really found in the early stages that you wanted to solve. SEIF: So, the early stages of what we wanted to solve we were mainly dealing with...so, the early, early stage, we were actually trying to be a Datadog competitor, where we were going to be self-hosted. Eventually, we focused on logs because we found out that's what was a big problem for most people, just event data, not just metric but generally event data, so logs, traces, et cetera. We built out our own logs database completely from scratch. And one of the things we stumbled upon was; basically, you have three things when it comes to logging, which is low cost, low latency, and large scale. That's what everybody wants. But you can't get all three of them; you can only get two of them. And we opted...like, we chose large scale and low cost. And when it comes to latency, we say it should be just fast enough, right? And that's where we focused on, and this is how we started building it. And with that, this is how we managed to stand out by just having way lower cost than anybody else in the industry and dealing with large scale. VICTORIA: That's really interesting. And how did you approach making the ingestion pipeline for masses amount of data more efficient? SEIF: Just make it coordination-free as possible, right? And get rid of Kafka because Kafka just, you know, drains your...it's where you throw in money. Like maintaining Kafka...it's like back then Elasticsearch, right? Elasticsearch was the biggest part of your infrastructure that would cost money. Now, it's also Kafka. So, we found a way to have our own internal way of queueing things without having to rely on Kafka. As I said, we wrote everything from scratch to make it work. Like, every now and then, I think that we can spin this out of the company and make it a new product. But now, eyes on the prize, right? JOE: It's interesting to hear that somebody who spent so much time in the open-source community ended up rolling their own solution to so many problems. Do you feel like you had some lessons learned from open source that led you to reject solutions like Kafka, or how did that journey go? SEIF: I don't think I'm rejecting Kafka. The problem is how Kafka is built, right? Kafka is still...you have to set up all these servers. They have to communicate, et cetera, etcetera. They didn't build it in a way where it's stateless, and that's what we're trying to go to. We're trying to make things as stateless as possible. So, Kafka was never built for the cloud-native era. And you can't really rely on SQS or something like that because it won't deal with this high throughput. So, that's why I said, like, we will sacrifice some latency, but at least the cost is low. So, if messages show after half a second or a second, I'm good. It doesn't have to be real-time for me. So, I had to write a couple of these things. But also, it doesn't mean that we reject open source. Like, we actually do like open source. We open-source a couple of libraries. We contribute back to open source, right? We needed a solution back then for that problem, and we couldn't find any. And maybe one day, open source will have, right? JOE: Yeah. I was going to ask if you considered open-sourcing any of your high latency, high throughput solutions. SEIF: Not high latency. You make it sound bad. JOE: [laughs] SEIF: You make it sound bad. It's, like, fast enough, right? I'm not going to compete on milliseconds because, also, I'm competing with ClickHouse. I don't want to compete with ClickHouse. ClickHouse is low latency and large scale, right? But then the cost is, you know, off the charts a bit sometimes. I'm going the other route. Like, you know, it's fast enough. Like, how, you know, if it's under two, three seconds, everybody's happy, right? If the results come within two, three seconds, everybody is happy. If you're going to build a real-time trading system on top of it, I'll strongly advise against that. But if you're building, you know, you're looking at dashboards, you're more in the observability field, yeah, we're good. VICTORIA: Yeah, I'm curious what you found, like, which customer personas that market really resonated with. Like, is there a particular, like, industry type where you're noticing they really want to lower their cost, and they're okay with this just fast enough latency? SEIF: Honestly, with the current recession, everybody is okay with giving up some of the speed to reduce the money because I think it's not linear reduction. It's more exponential reduction at this point, right? You give up a second, and you're saving 30%. You give up two seconds, all of a sudden, you're saving 80%. So, I'd say in the beginning, everybody thought they need everything to be very, very fast. And now they're realizing, you know, with limitations you have around your budget and spending, you're like, okay, I'm okay with the speed. And, again, we're not slow. I'm just saying people realize they don't need everything under a second. They're okay with waiting for two seconds. VICTORIA: That totally resonates with me. And I'm curious if you can add maybe a non-technical or a real-life example of, like, how this impacts the operations of a company or organization, like, if you can give us, like, a business-y example of how this impacts how people work. SEIF: I don't know how, like, how do people work on that? Nothing changed, really. They're still doing the, like...really nothing because...and that aspect is you run a query, and, again, as I said, you're not getting the result in a second. You're just waiting two seconds or three seconds, and it's there. So, nothing really changed. I think people can wait three seconds. And we're still like–when I say this, we're still faster than most others. We're just not as fast as people who are trying to compete on a millisecond level. VICTORIA: Yeah, that's okay. Maybe I'll take it back even, like, a step further, right? Like, our audience is really sometimes just founders who almost have no formal technical training or background. So, when we talk about observability, sometimes people who work in DevOps and operations all understand it and kind of know why it's important [laughs] and what we're talking about. So, maybe you could, like, go back to -- SEIF: Oh, if you're asking about new types of people who've been using it -- VICTORIA: Yeah. Like, if you're going to explain to, like, a non-technical founder, like, why your product is important, or, like, how people in their organization might use it, what would you say? SEIF: Oh, okay, if you put it like that. It's more of if you have data, timestamp data, and you want to run analytics on top of it, so that could be transactions, that could be web vitals, rather than count every time somebody visits, you have a timestamp. So, you can count, like, how many visitors visited the website and what, you know, all these kinds of things. That's where you want to use something like Axiom. That's outside the DevOps space, of course. And in DevOps space, there's so many other things you use Axiom for, but that's outside the DevOps space. And we actually...we implemented as zero-config integration with Vercel that kind of went viral. And we were, for a while, the number one enterprise for self-integration because so many people were using it. So, Vercel users are usually not necessarily writing the most complex backends, but a lot of things are happening on the front-end side of things. And we would be giving them dashboards, automated dashboards about, you know, latencies, and how long a request took, and how long the response took, and the content type, and the status codes, et cetera, et cetera. And there's a huge user base around that. VICTORIA: I like that. And it's something, for me, you know, as a managing director of our platform engineering team, I want to talk more to founders about. It's great that you put this product and this app out into the world. But how do you know that people are actually using it? How do you know that people, like, maybe, are they all quitting after the first day and not coming back to your app? Or maybe, like, the page isn't loading or, like, it's not working as they expected it to. And, like, if you don't have anything observing what users are doing in your app, then it's going to be hard to show that you're getting any traction and know where you need to go in and make corrections and adjust. SEIF: We have two ways of doing this. Right now, internally, we use our own tools to see, like, who is sending us data. We have a deployment that's monitoring production deployment. And we're just, you know, seeing how people are using it, how much data they're sending every day, who stopped sending data, who spiked in sending data sets, et cetera. But we're using Mixpanel, and Dominic, our Head of Product, implemented a couple of key metrics to that for that specifically. So, we know, like, what's the average time until somebody starts going from building its own queries with the builder to writing APL, or how long it takes them from, you know, running two queries to five queries. And, you know, we just start measuring these things now. And it's been going...we've been growing healthy around that. So, we tend to measure user interaction, but also, we tend to measure how much data is being sent. Because let's keep in mind, usually, people go in and check for things if there's a problem. So, if there's no problem, the user won't interact with us much unless there's a notification that kicks off. We also just check, like, how much data is being sent to us the whole time. VICTORIA: That makes sense. Like, you can't just rely on, like, well, if it was broken, they would write a [chuckles], like, a question or something. So, how do you get those metrics and that data around their interactions? So, that's really interesting. So, I wonder if we can go back and talk about, you know, we already mentioned a little bit about, like, the early days of Axiom and how you got started. Was there anything that you found in the early discovery process that was surprising and made you pivot strategy? SEIF: A couple of things. Basically, people don't really care about the tech as much as they care [inaudible 12:51] and the packaging, so that's something that we had to learn. And number two, continuous feedback. Continuous feedback changed the way we worked completely, right? And, you know, after that, we had a Slack channel, then we opened a Discord channel. And, like, this continuous feedback coming in just helps with iterating, helps us with prioritizing, et cetera. And that changed the way we actually developed product. VICTORIA: You use Slack and Discord? SEIF: No. No Slack anymore. We had a community Slack. We had a community [inaudible 13:19] Slack. Now, there's no community Slack. We only have a community Discord. And the community Slack is...sorry, internally, we use Slack, but there's a community Discord for the community. JOE: But how do you keep that staffed? Is it, like, everybody is in the Discord during working hours? Is it somebody's job to watch out for community questions? SEIF: I think everybody gets involved now just...and you can see it. If you go on our Discord, you will just see it. Just everyone just gets involved. I think just people are passionate about what they're doing. At least most people are involved on Discord, right? Because there's, like, Discord the help sections, and people are just asking questions and other people answering. And now, we reached a point where people in the community start answering the questions for other people in the community. So, that's how we see it's starting to become a healthy community, et cetera. But that is one of my favorite things: when I see somebody from the community answering somebody else, that's a highlight for me. Actually, we hired somebody from that community because they were so active. JOE: Yeah, I think one of the biggest signs that a product is healthy is when there's a healthy ecosystem building up around it. SEIF: Yeah, and Discord reminds me of the old days of open sources like IRC, just with memes now. But because all of us come from the old IRC days, being on Discord and chatting around, et cetera, et cetera, just gives us this momentum back, gave us this momentum back, whereas Slack always felt a bit too businessy to me. JOE: Slack is like IRC with emoji. Discord is IRC with memes. SEIF: I would say Slack reminds me somehow of MSN Messenger, right? JOE: I feel like there's a huge slam on MSN Messenger here. SEIF: [laughs] What do you guys use internally, Slack or? I think you're using Slack, right? Or Teams. Don't tell me you're using Teams. JOE: No, we're using Slack. SEIF: Okay, good, because I shit talk. Like, there is this, I'll sh*t talk here–when I start talking about Teams, so...I remember that one thing Google did once, and that failed miserably. JOE: Google still has, like, seven active chat products. SEIF: Like, I think every department or every, like, group of engineers just uses one of them internally. I'm not sure. Never got to that point. But hey, who am I to judge? VICTORIA: I just feel like I end up using all of them, and then I'm just rotating between different tabs all day long. You maybe talked me into using Discord. I feel like I've been resisting it, but you got me with the memes. SEIF: Yeah, it's definitely worth it. It's more entertaining. More noise, but more entertaining. You feel it's alive, whereas Slack is...also because there's no, like, history is forever. So, you always go back, and you're like, oh my God, what the hell is this? VICTORIA: Yeah, I have, like, all of them. I'll do anything. SEIF: They should be using Axiom in the background. Just send data to Axiom; we can keep your chat history. VICTORIA: Yeah, maybe. I'm so curious because, you know, you mentioned something about how you realized that it didn't matter really how cool the tech was if the product packaging wasn't also appealing to people. Because you seem really excited about what you've built. So, I'm curious, so just tell us a little bit more about how you went about trying to, like, promote this thing you built. Or was, like, the continuous feedback really early on, or how did that all kind of come together? SEIF: The continuous feedback helped us with performance, but actually getting people to sign up and pay money it started early on. But with Vercel, it kind of skyrocketed, right? And that's mostly because we went with the whole zero-config approach where it's just literally two clicks. And all of a sudden, Vercel is sending your data to Axiom, and that's it. We will create [inaudible 16:33]. And we worked very closely with Vercel to do this, to make this happen, which was awesome. Like, yeah, hats off to them. They were fantastic. And just two clicks, three clicks away, and all of a sudden, we created Axiom organization for you, the data set for you. And then we're sending it...and the data from Vercel is being forwarded to it. I think that packaging was so simple that it made people try it out quickly. And then, the experience of actually using Axiom was sticky, so they continued using it. And then the price was so low because we give 500 gigs for free, right? You send us 500 gigs a month of logs for free, and we don't care. And you can start off here with one terabyte for 25 bucks. So, people just start signing up. Now, before that, it was five terabytes a month for $99, and then we changed the plan. But yeah, it was cheap enough, so people just start sending us more and more and more data eventually. They weren't thinking...we changed the way people start thinking of “what am I going to send to Axiom” or “what am I going to send to my logs provider or log storage?” To how much more can I send? And I think that's what we wanted to reach. We wanted people to think, how much more can I send? JOE: You mentioned latency and cost. I'm curious about...the other big challenge we've seen with observability platforms, including logs, is cardinality of labels. Was there anything you had to sacrifice upfront in terms of cardinality to manage either cost or volume? SEIF: No, not really. Because the way we designed it was that we should be able to deal with high cardinality from scratch, right? I mean, there's open-source ways of doing, like, if you look at how, like, a column store, if you look at a column store and every dimension is its own column, it's just that becomes, like, you can limit on the amount of columns you're creating, but you should never limit on the amount of different values in a column could be. So, if you're having something like stat tags, right? Let's say hosting, like, hostname should be a column, but then the different hostnames you have, we never limit that. So, the cardinality on a value is something that is unlimited for us, and we don't really see it in cost. It doesn't really hit us on cost. It reflects a bit on compression if you get into technical details of that because, you know, high cardinality means a lot of different data. So, compression is harder, but it's not repetitive. But then if you look at, you know, oh, I want to send a lot of different types of fields, not values with fields, so you have hostname, and latency, and whatnot, et cetera, et cetera, yeah, that's where limitation starts because then they have...it's like you're going to a wide range of...and a wider dimension. But even that, we, yeah, we can deal with thousands at this point. And we realize, like, most people will not need more than three or four. It's like a Postgres table. You don't need more than 3,000 to 4000 columns; else, you know, you're doing a lot. JOE: I think it's actually pretty compelling in terms of cost, though. Like, that's one of the things we've had to be most careful about in terms of containing cost for metrics and logs is, a lot of providers will...they'll either charge you based on the number of unique metric combinations or the performance suffers greatly. Like, we've used a lot of Prometheus-based solutions. And so, when we're working with developers, even though they don't need more than, you know, a few dozen metric combinations most of the time, it's hard for people to think of what they need upfront. It's much easier after you deploy it to be able to query your data and slice it retroactively based on what you're seeing. SEIF: That's the detail. When you say we're using Prometheus, a lot of the metrics tools out there are using, just like Prometheus, are using the Gorilla data structure. And the real data structure was never designed to deal with high cardinality labels. So, basically, to put it in a simple way, every combination of tags you send for metrics is its own file on disk. That's, like, the very simple way of explaining this. And then, when you're trying to search through everything, right? And you have a lot of these combinations. I actually have to get all these files from this conversion back together, you know, and then they're chunked, et cetera. So, it's a problem. Generally, how metrics are doing it...most metrics products are using it, even VictoriaMetrics, et cetera. What they're doing is they're using either the Prometheus TSDB data structure, which is based on Gorilla. Influx was doing the same thing. They pivoted to using more and more like the ones we use, and Honeycomb uses, right? So, we might not be as fast on metrics side as these highly optimized. But then when it comes to high [inaudible 20:49], once we start dealing with high cardinality, we will be faster than those solutions. And that's on a very technical level. JOE: That's pretty cool. I realize we're getting pretty technical here. Maybe it's worth defining cardinality for the audience. SEIF: Defining cardinality to the...I mean, we just did that, right? JOE: What do you think, Victoria? Do you know what cardinality is now? [laughs] VICTORIA: All right. Now I'm like, do I know? I was like, I think I know what it means. Cardinality is, like, let's say you have a piece of data like an event or a transaction. SEIF: It's like the distinct count on a property that gives you the cardinality of a property. VICTORIA: Right. It's like how many pieces of information you have about that one event, basically, yeah. JOE: But with some traditional metrics stores, it's easy to make mistakes. For example, you could have unbounded cardinality by including response time as one of the labels -- SEIF: Tags. JOE: And then it's just going to -- SEIF: Oh, no, no. Let me give you a better one. I put in timestamp at some point in my life. JOE: Yeah, I feel like everybody has done that one. [laughter] SEIF: I've put a system timestamp at some point in my life. There was the actual timestamp, and there was a system timestamp that I would put because I wanted to know when the...because I couldn't control the timestamp, and the only timestamp I had was a system timestamp. I would always add the actual timestamp of when that event actually happened into a metric, and yeah, that did not scale. MID-ROLL AD: Are you an entrepreneur or start-up founder looking to gain confidence in the way forward for your idea? At thoughtbot, we know you're tight on time and investment, which is why we've created targeted 1-hour remote workshops to help you develop a concrete plan for your product's next steps. Over four interactive sessions, we work with you on research, product design sprint, critical path, and presentation prep so that you and your team are better equipped with the skills and knowledge for success. Find out how we can help you move the needle at tbot.io/entrepreneurs. VICTORIA: Yeah. I wonder if you could maybe share, like, a story about when it's gone wrong, and you've suddenly charged a lot of money [laughs] just to get information about what's happening in the system. Any, like, personal experiences with observability that kind of informed what you did with Axiom? SEIF: Oof, I have a very bad one, like, a very, very bad one. I used to work for a company. We had to deploy Elasticsearch on Windows Servers, and it was US-East-1. So, just a combination of Elasticsearch back in 2013, 2014 together with Azure and Windows Server was not a good idea. So, you see where this is going, right? JOE: I see where it's going. SEIF: Eventually, we had, like, we get all these problems because we used Elasticsearch and Kibana as our, you know, observability platform to measure everything around the product we were building. And funny enough, it cost us more than actually maintaining the infrastructure of the product. But not just that, it also kept me up longer because most of the downtimes I would get were not because of the product going down. It's because my Elasticsearch cluster started going down, and there's reasons for that. Because back then, Microsoft Azure thought that it's okay for any VM to lose connection with the rest of the VMs for 30 seconds per day. And then, all of a sudden, you have Elasticsearch with a split-brain problem. And there was a phase where I started getting alerted so much that back then, my partner threatened to leave me. So I bought a...what I think was a shock bracelet or a shock collar via Bluetooth, and I connected it to phone for any notification. And I bought that off Alibaba, by the way. And I would charge it at night, put it on my wrist, and go to sleep. And then, when alert happens, it will fully discharge the battery on me every time. JOE: Okay, I have to admit, I did not see where that was going. SEIF: Yeah, did that for a while; definitely did not save my relationship either. But eventually, that was the point where, you know, we started looking into other observability tools like Datadog, et cetera, et cetera, et cetera. And that's where the actual journey began, where we moved away from Elasticsearch and Kibana to look for something, okay, that we don't have to maintain ourselves and we can use, et cetera. So, it's not about the costs as much; it was just pain. VICTORIA: Yeah, pain is a real pain point, actual physical [chuckles] and emotional pain point [laughter]. What, like, motivates you to keep going with Axiom and to keep, like, the wind in your sails to keep working on it? SEIF: There's a couple of things. I love working with my team. So, honestly, I just wake up, and I compliment my team. I just love working with them. They're a lot of fun to work with. And they challenge me, and I challenge them back. And I upset them a lot. And they can't upset me, but I upset them. But I love working with them, and I love working with that team. And the other thing is getting, like, having this constant feedback from customers just makes you want to do more and, you know, close sales, et cetera. It's interesting, like, how I'm a very technical person, and I'm more interested in sales because sales means your product works, the product, the technical parts, et cetera. Because if technically it's not working, you can't build a product on top of it. And if you're not selling it, then what's the point? You only sell when the product is good, more or less, unless you're Oracle. VICTORIA: I had someone ask me about Oracle recently, actually. They're like, "Are you considering going back to it?" And I'm maybe a little allergic to it from having a federal consulting background [laughs]. But maybe they'll come back around. I don't know. We'll see. SEIF: Did you sell your soul back then? VICTORIA: You know, I feel like I just grew up in a place where that's what everyone did was all. SEIF: It was Oracle, IBM, or HP back in the day. VICTORIA: Yeah. Well, basically, when you're working on applications that were built in, like, the '80s, Oracle was, like, this hot, new database technology [laughs] that they just got five years ago. So, that's just, yeah, interesting. SEIF: Although, from a database perspective, they did a lot of the innovations. A lot of first innovations could have come from Oracle. From a technical perspective, they're ridiculous. I'm not sure from a product perspective how good they are. But I know their sales team is so big, so huge. They don't care about the product anymore. They can still sell. VICTORIA: I think, you know, everything in tech is cyclical. So, you know, if they have the right strategy and they're making some interesting changes over there, there's always a chance [laughs]. Certain use cases, I mean, I think that's the interesting point about working in technology is that you know, every company is a tech company. And so, there's just a lot of different types of people, personas, and use cases for different types of products. So, I wonder, you know, you kind of mentioned earlier that, like, everyone is interested in Axiom. But, you know, I don't know, are you narrowing the market? Or, like, how are you trying to kind of focus your messaging and your sales for Axiom? SEIF: I'm trying to focus on developers. So, we're really trying to focus on developers because the experience around observability is crap. It's stupid expensive. Sorry for being straightforward, right? And that's what we're trying to change. And we're targeting developers mainly. We want developers to like us. And we'll find all these different types of developers who are using it, and that's the interesting thing. And because of them, we start adding more and more features, like, you know, we added tracing, and now that enables, like, billions of events pushed through for, you know, again, for almost no money, again, $25 a month for a terabyte of data. And we're doing this with metrics next. And that's just to address the developers who have been giving us feedback and the market demand. I will sum it up, again, like, the experience is crap, and it's stupid expensive. I think that's the [inaudible 28:07] of observability is just that's how I would sum it up. VICTORIA: If you could go back in time and talk to yourself when you were still a developer, now that you're CTO, what advice would you give yourself? JOE: Besides avoiding shock collars. VICTORIA: [laughs] Yes. SEIF: Get people's feedback quickly so you know you're on the right track. I think that's very, very, very, very important. Don't just work in the dark, or don't go too long into stealth mode because, eventually, people catch up. Also, ship when you're 80% ready because 100% is too late. I think it's the same thing here. JOE: Ship often and early. SEIF: Yeah, even if it's not fully ready, it's still feedback. VICTORIA: Ship often and early and talk to people [laughs]. Just, do you feel like, as a developer, did you have the skills you needed to be able to get the most out of those feedback and out of those conversations you were having with people around your product? SEIF: I still don't think I'm good enough. You're just constantly learning, right? I just accepted I'm part of a team, and I have my contributions. But as an individual, I still don't think I know enough. I think there's more I need to learn at this point. VICTORIA: I wonder, what questions do you have for me or Joe? SEIF: How did you start your podcast, and why the name? VICTORIA: Oh, man, I hope I can answer. So, the podcast was started...I think it's, like, we're actually about to be at our 500th Episode. So, I've only been a host for the last year. Maybe Joe even knows more than I do. But what I recall is that one person at thoughtbot thought it would be a great idea to start a podcast, and then they did it. And it seems like the whole company is obsessed with robots. I'm not really sure where that came from. There used to be a tiny robot in the office, is what I remember. And people started using that as, like, the mascot. And then, yeah, that's it, that's the whole thing. SEIF: Was the robot doing anything useful or just being cute? JOE: It was just cute, and it's hard to make a robot cute. SEIF: Was it a real robot, or was it like a -- JOE: No, there was, at one point, a toy robot. The name...I actually forget the origin–origin of the name, but the name Giant Robots comes from our blog. So, we named the podcast the same as the blog: Giant Robots Smashing Into Other Giant Robots. SEIF: Yes, it's called transformers. VICTORIA: Yeah, I like it. It's, I mean, now I feel like -- SEIF: [laughs] VICTORIA: We got to get more, like, robot dogs involved [laughs] in the podcast. SEIF: Like, I wanted to add one thing when we talked about, you know, what gets me going. And I want to mention that I have a six-month-old son now. He definitely adds a lot of motivation for me to wake up in the morning and work. But he also makes me wake up regardless if I want to or not. VICTORIA: Yeah, you said you had invented an alarm clock that never turns off. Never snoozes [laughs]. SEIF: Yes, absolutely. VICTORIA: I have the same thing, but it's my dog. But he does snooze, actually. He'll just, like, get tired and go back to sleep [laughs]. SEIF: Oh, I have a question. Do dogs have a Tamagotchi phase? Because, like, my son, the first three months was like a Tamagotchi. It was easy to read him. VICTORIA: Oh yeah, uh-huh. SEIF: Noisy but easy. VICTORIA: Yes, yes. SEIF: Now, it's just like, yeah, I don't know, like, the last month he has opinions at six months. I think it's because I raised him in Europe. I should take him back to the Middle East [laughs]. No opinions. VICTORIA: No, dogs totally have, like, a communication style, you know, I pretty much know what he, I mean, I can read his mind, obviously [laughs]. SEIF: Sure, but that's when they grow a bit. But what when they were very...when the dog was very young? VICTORIA: Yeah, they, I mean, they also learn, like, your stuff, too. So, they, like, learn how to get you to do stuff or, like, I know she'll feed me if I'm sitting here [laughs]. SEIF: And how much is one dog year, seven years? VICTORIA: Seven years. SEIF: Seven years? VICTORIA: Yeah, seven years? SEIF: Yeah. So, basically, in one year, like, three months, he's already...in one month, he's, you know, seven months old. He's like, yeah. VICTORIA: Yeah. In a year, they're, like, teenagers. And then, in two years, they're, like, full adults. SEIF: Yeah. So, the first month is basically going through the first six months of a human being. So yeah, you pass...the first two days or three days are the Tamagotchi phase that I'm talking about. VICTORIA: [chuckles] I read this book, and it was, like, to understand dogs, it's like, they're just like humans that are trying to, like, maximize the number of positive experiences that they have. So, like, if you think about that framing around all your interactions about, like, maybe you're trying to get your son to do something, you can be like, okay, how do I, like, I don't know, train him that good things happen when he does the things I want him to do? [laughs] That's kind of maybe manipulative but effective. So, you're not learning baby sign language? You're just, like, going off facial expressions? SEIF: I started. I know how Mama looks like. I know how Dada looks like. I know how more looks like, slowly. And he already does this thing that I know that when he's uncomfortable, he starts opening and closing his hands. And when he's completely uncomfortable and basically that he needs to go sleep, he starts pulling his own hair. VICTORIA: [laughs] I do the same thing [laughs]. SEIF: You pull your own hair when you go to sleep? I don't have that. I don't have hair. VICTORIA: I think I do start, like, touching my head though, yeah [inaudible 33:04]. SEIF: Azure took the last bit of hair I had! Went away with Azure, Elasticsearch, and the shock collar. VICTORIA: [laughs] SEIF: I have none of them left. Absolutely nothing. I should sue Elasticsearch for this shit. VICTORIA: [laughs] Let me know how that goes. Maybe there's more people who could join your lawsuit, you know, with a class action. SEIF: [laughs] Yeah. Well, one thing I wanted to also just highlight is, right now, one of the things that also makes the company move forward is we realized that in a single domain, we proved ourselves very valuable to specific companies, right? So, that was a big, big thing, milestone for us. And now we're trying to move into a handful of domains and see which one of those work out the best for us. Does that make sense? VICTORIA: Yeah. And I'm curious: what are the biggest challenges or hurdles that you associate with that? SEIF: At this point, you don't want just feedback. You want constructive criticism. Like, you want to work with people who will criticize the applic...and you iterate with them based on this criticism, right? They're just not happy about you and trying to create design partners. So, for us, it was very important to have these small design partners who can work with us to actually prove ourselves as valuable in a single domain. Right now, we need to find a way to scale this across several domains. And how do you do that without sacrificing? Like, how do you open into other domains without sacrificing the original domain you came from? So, there's a lot of things [inaudible 34:28]. And we are in the middle of this. Honestly, I Forrest Gumped my way through half of this, right? Like, I didn't know what I was doing. I had ideas. I think it's more of luck at this point. And I had luck. No, we did work. We did work a lot. We did sleepless nights and everything. But I think, in the last three years, we became more mature and started thinking more about product. And as I said, like, our CEO, Neil, and Dominic, our head of product, are putting everything behind being a product-led organization, not just a tech-led organization. VICTORIA: That's super interesting. I love to hear that that's the way you're thinking about it. JOE: I was just curious what other domains you're looking at pushing into if you can say. SEIF: So, we are going to start moving into ETL a bit more. We're trying to see how we can fit in specific ML scenarios. I can't say more about the other, though. JOE: Do you think you'll take the same approaches in terms of value proposition, like, low cost, good enough latency? SEIF: Yes, that's definitely one thing. But there's also...so, this is the values we're bringing to the customer. But also, now, our internal values are different. Now it's more of move with urgency and high velocity, as we said before, right? Think big, work small. The values in terms of values we're going to take to the customers it's the same ones. And maybe we'll add some more, but it's still going to be low-cost and large-scale. And, internally, we're just becoming more, excuse my French, agile. I hate that word so much. Should be good with Scrum. VICTORIA: It's painful, but everyone knows what you're talking about [laughs], you know, like -- SEIF: See, I have opinions here about Scrum. I think Scrum should be only used in terms of iceScrum [inaudible 36:04], or something like that. VICTORIA: Oh no [laughter]. Well, it's a Rugby term, right? Like, that's where it should probably stay. SEIF: I did not know it's a rugby term. VICTORIA: Yeah, so it should stay there, but -- SEIF: Yes [laughs]. VICTORIA: Yeah, I think it's interesting. Yeah, I like the being flexible. I like the just, like, continuous feedback and how you all have set up to, like, talk with your customers. Because you mentioned earlier that, like, you might open source some of your projects. And I'm just curious, like, what goes into that decision for you when you're going to do that? Like, what makes you think this project would be good for open source or when you think, actually, we need to, like, keep it? SEIF: So, we open source libraries, right? We actually do that already. And some other big organizations use our libraries; even our competitors use our libraries, that we do. The whole product itself or at least a big part of the product, like database, I'm not sure we're going to open source that, at least not anytime soon. And if we open source, it's going to be at a point where the value-add it brings is nothing compared to how well our product is, right? So, if we can replace whatever's at the back with...the storage engine we have in the back with something else and the product doesn't get affected, that's when we open source it. VICTORIA: That's interesting. That makes sense to me. But yeah, thank you for clarifying that. I just wanted to make sure to circle back. Since you have this big history in open source, yeah, I'm curious if you see... SEIF: Burning me out? VICTORIA: Burning you out, yeah [laughter]. Oh, that's a good question. Yeah, like, because, you know, we're about to be in October here. Do you have any advice or strategies as a maintainer for not getting burned out during the next couple of weeks besides, like, hide in a cave and without internet access [laughs]? SEIF: Stay away from Reddit and Hacker News. That's my goal for October now because I'm always afraid of getting too attached to an idea, or too motivated, or excited by an idea that I drift away from what I am actually supposed to be doing. VICTORIA: Last question is, is there anything else you would like to promote? SEIF: Yeah, check out our website; I think it's at axiom.co. Check it out. Sign up. And comment on Discord and talk to me. I don't bite, sometimes grumpy, but that's just because of lack of sleep in the morning. But, you know, around midday, I'm good. And if you're ever in Berlin and you want to hang out, I'm more than willing to hang out. VICTORIA: Whoo, that's awesome. Yeah, Berlin is great. I was there a couple of years ago but no plans to go back anytime soon, but maybe I'll keep that in mind. You can subscribe to the show and find notes along with a complete transcript for this episode at giantrobots.fm. If you have questions or comments, email us at hosts@giantrobots.fm. And you could find me on Twitter @victori_ousg. And this podcast is brought to you by thoughtbot and produced and edited by Mandy Moore. Thanks for listening. See you next time. Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us. More info on our website at tbot.io/referral. Or you can email us at referrals@thoughtbot.com with any questions. Special Guests: Joe Ferris and Seif Lotfy.
Dans cet épisode je reçois Simon Parisot, CEO et cofondateur de la néo-banque Blank qui propose des comptes professionnels pour les indépendants. On y parle de l'approche serverless développée depuis le début sur cette plateforme bancaire née en 9 mois. Ou pourquoi être fortement dépendant d'un prestataire comme AWS. Quels sont ses avantages et ses inconvénients ? - Entre rapidité, coût et recrutement des meilleurs (des devs capables de faire du dev serverless, de la scalabilité…) - Mais aussi performance, émission carbone, - Les grandes forces et apports, - Et la difficulté à travailler en serverless.
Welcome episode 220 of The Cloud Pod podcast - where the forecast is always cloudy! This week your hosts, Justin, Jonathan, Ryan, and Matthew discuss all things cloud, including virtual machines, an AI partnership between Microsoft and Meta for Llama 2, Lambda functions, Fargate, and lots of security updates including the Outlook breach and WORM protections. This and much more in our newest episode. Titles we almost went with this week: Too Many Bees died for Honeycode Microsoft announces that AI will only cost you 3 arms and a leg. The Cloud Pod also detects Recursive Loops in cloud news The cloud pod disables health checks bc who needs them A big thanks to this week's sponsor: Foghorn Consulting, provides top-notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you have trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.
Two brothers discussing all things AWS every week. Hosted by Andreas and Michael Wittig presented by cloudonaut.
Welcome to the newest episode of The Cloud Pod podcast - where the forecast is always cloudy! Today your hosts are Jonathan and Matt as we discuss all things cloud and AI, including Temporary Elevated Access Management (or TEAM, since we REALLY like acronyms today) FTP servers, SQL servers and all the other servers, as well as pipelines, whether or not the government should regulate AI (spoiler alert: the AI companies don't think so) and some updates to security at Amazon and Google. Titles we almost went with this week: The Cloud Pod's FTP server now with post-quantum keys support The CloudPod can now Team into your account, but only temporarily The CloudPod dusts off their old floppy drive The CloudPod dusts off their old SQL server disks The CloudPod is feeling temporarily elevated to do a podcast The CloudPod promise that AI will not take over the world The CloudPod duals with keys The CloudPod is feeling temporarily elevated. A big thanks to this week's sponsor: Foghorn Consulting, provides top-notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you have trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.
Jeremy Snyder, Founder of FireTail, joins Corey on Screaming in the Cloud to discuss his career journey and what led him to start FireTail. Jeremy reveals what's changed in cloud since he was an AE and AWS, and walks through how the need for customization in cloud security has led to a boom in the number of security companies out there. Corey and Jeremy also discuss the costs of cloud security, and Jeremy points out some of his observations in the world of cloud security pricing and packaging. About JeremyJeremy is the founder and CEO of FireTail.io, an end-to-end API security startup. Prior to FireTail, Jeremy worked in M&A at Rapid7, a global cyber leader, where he worked on the acquisitions of 3 companies during the pandemic. Jeremy previously led sales at DivvyCloud, one of the earliest cloud security posture management companies, and also led AWS sales in southeast Asia. Jeremy started his career with 13 years in cyber and IT operations. Jeremy has an MBA from Mason, a BA in computational linguistics from UNC, and has completed additional studies in Finland at Aalto University. Jeremy speaks 5 languages and has lived in 5 countries. Once, Jeremy went 5 days without seeing another human, but saw plenty of reindeer.Links Referenced: Firetail: https://firetail.io Email: jeremy@firetail.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Jeremy Snyder, who's the founder at Firetail. Jeremy, thank you for joining me today. I appreciate you taking the time from your day to suffer my slings and arrows.Jeremy: My pleasure, Corey. I'm really happy to be here.Corey: So, we'll get to a point where we talk about what you're up to these days, but first, I want to dive into the jobs of yesteryear because over a decade ago, you did a stint at AWS doing sales. And not to besmirch your hard work, but it feels like at the time, that must have been a very easy job. Because back then it really felt across the board like the sales motion was basically responding to, “Well, why should we do business with you?” And the response is, “Oh, you misunderstand. You have 87 different accounts scattered throughout your organization. I'm just here to give you visibility, governance, and possibly some discounting over that.” It feels like times have changed in a lot of ways since then. Is that accurate?Jeremy: Well, yeah, but I will correct a couple of things in there. In my days—Corey: Oh, please.Jeremy: —almost nobody had more than one account. I was in the one account, no VPCs, you know, you only separate your workloads by tagging days of AWS. So, our job was a lot, actually, harder at the time because people couldn't wrap their heads around the lack of subnetting, the lack of workload segregation. All of that was really, like, brand new to people, and so you were trying to tell them like, “Hey, you're going to be launching something on an EC2 instance that's in the same subnet as everybody else's EC2 instance.” And people were really worried about lateral traffic and sniffing and what could their neighbors or other customers on AWS see. And by the way, I mean, this was the customers who even believed it was real. You know, a lot of the conversations we went into with people was, “Oh, so Amazon bought too many servers and you're trying to sell us excess capacity.”Corey: That legend refuses to die.Jeremy: And, you know, it is a legend. That is not at all the genesis of AWS. And you know, the genesis is pretty well publicized at this point; you can go just google, “how did AWS started?” You can find accurate stuff around that.Corey: I did it a few years ago with multiple Amazon execs and published it, and they said definitively that that story was not true. And you can say a lot about AWS folks, and I assure you, I do, but I also do not catch them lying to my face, ever. And as soon as that changes, well, now we're going to have a different series of [laugh] conversations that are a lot more pointed. But they've earned some trust there.Jeremy: Yeah, I would agree. And I mean, look, I saw it internally, the way that Amazon built stuff was at such a breakneck pace, that challenge that they had that was, you know, the published version of events for why AWS got created, developers needed a place to test code. And that was something that they could not get until they got EC2, or could not get in a reasonably enough timeframe for it to be, you know, real-time valid or relevant for what was going on with the company. So, you know, that really is the genesis of things, and you know, the early services, SQS, S3, EC2, they all really came out of that journey. But yeah, in our days at AWS, there was a lot of ease, in the sense that lots of customers had pent-up frustrations with their data center providers or their colo providers and lots of customers would experience bursts and they would have capacity constraints and they would need a lot of the features that AWS offered, but we had to overcome a lot of technical misunderstandings and trust issues and, you know, oh, hey, Amazon just wants to sniff our data and they want to see what we're up to, and explain to them how encryption works and why they have their own keys and all these things. You know, we had to go through a lot of that. So, it wasn't super easy, but there was some element of it where, you know, just demand actually did make some aspects easy.Corey: What have you seen change since, well I guess ten years ago and change now? And let's be clear, you don't work in AWS sales, but you also are not oblivious to what the market is doing.Jeremy: For sure. For sure. I left AWS in 2011 and I've stayed in the cloud ecosystem pretty much ever since. I did spend some time working for a system integrator where all we did was migrate customers to AWS. And then I spent about five, six years working on cloud security primarily focused on AWS, a lot of GCP, a little bit of Azure.So yeah, I mean, I certainly stay up to date with what's going on in the state of cloud. I mean, look, Cloud has evolved from this kind of, you know, developer-centric, very easy-to-launch type of platform into a fully-fledged enterprise IT platform and all of the management structures and all of the kind of bells and whistles that you would want that you probably wanted from your old VMware networks but never really got, they're all there now. It is a very different ballgame in terms of what the platform actually enables you to do, but fundamentally, a lot of the core building block constructs and the primitives are still kind of driving the heart of it. It's just a lot of nicer packaging.What I think is really interesting is actually how customers' usage of cloud platforms has changed over time. And I always think of it and kind of like the, going back to my days, what did I see from my customers? And it was kind of like the month zero, “I just don't believe you.” Like, “This thing can't be real, I don't trust it, et cetera.” Month one is, I'm going to assign some developer to work on some very low-priority, low-risk workload. In my days, that was SharePoint, by the way. Like, nine times out of ten, the first workload that customers stood up was a SharePoint instance that they had to share across multiple locations.Corey: That thing falls over all the time anyway. May as well put it in the cloud where it can do so without taking too much else down with it. Was that the thinking or?Jeremy: Well, and the other thing about it at the time, Corey, was that, like, so many customers worked in this, like, remote-first world, right? And so, SharePoint was inevitably hosted at somebody's office. And so, the workers at that office were so privileged over the workers everywhere else. The performance gap between consuming SharePoint in one location versus another was like, night and day. So, you know, employees in headquarters were like, “Yeah, SharePoint's great.” Employees in branch offices were like, “This thing is terrible,” you know? “It's so slow. I hate it, I hate it, I hate it.”And so, Cloud actually became, like, this neutral location to move SharePoint to that kind of had an equal performance for every office. And so, that was, I think, one of the reasons and it was also, you know, it had capacity problems, and customers were right at that point, uploading tons of static documents to it, like Word documents, Office attachments, et cetera, and so they were starting to have some of these, like, real disk sprawl problems with SharePoint. So, that was kind of the month one problem. And only after they get through kind of month two, three, and four, and they go through, “I don't understand my bill,” and, “Help me understand security implications,” then they think about, like, “Hey, should we go back and look at how we're running that SharePoint stuff and maybe do it more efficiently and, like, move those static Office documents onto S3?” And so on, and so on.And that's kind of one of the big things that I've changed that I would say is very different from, like, 2011 to now, is there's enough sophistication around understanding that, like, you don't just translate what you're doing in your office or in your data center to what you're doing on cloud. Or if you do, you're not getting the most out of your investment.Corey: I'm curious to get your take on how you have seen cloud adoption patterns differ, specifically tied to geo. I mean, I tend to see it from a world where there's a bifurcation of between born-in-the-cloud SaaS-type companies where one workload is 80% of their bill or whatnot, and of the big enterprises where the largest single component is 3%. So, it's a very different slice there. But I'm curious what you would see from a sales perspective, looking across a lot of different geographic boundaries because we're all, on some level, biased based upon where we tend to spend our time doing business. I'm in San Francisco, which is its very own strange universe that has a certain perspective about itself that is occasionally accurate, but not usually. But it's a big world out there.Jeremy: It is. One thing that I would say it's interesting. I spent my AWS days based in Singapore, living in Singapore at the time, and I was working with customers across Southeast Asia. And to your point, Corey, one of the most interesting things was this little bit of a leapfrog effect. Data centers in Asia-Pac, especially in places like the Philippines, were just terrible.You know, the Philippines had, like, the second highest electricity rates in Asia at the time, only behind Japan, even though the GDP per capita gap between those two countries is really large. And yet you're paying, like, these super-high electricity rates. Secondarily, data centers in the Philippines were prone to flooding. And so, a lot of companies in the Philippines never went the data center route. You know, they just hosted servers in their offices, you know, they had a bunch of desktop machines in a cubicle, that kind of situation because, like, data centers themselves were cost prohibitive.So, you saw this effect a little bit like cell phones in a lot of the developing world. Landline infrastructure was too expensive or never got done for whatever reason, and people went straight to cell phones. So actually, what I saw in a lot of emerging markets in Asia was, screw the data center; we're going to go straight to cloud. So, I saw a lot of Asia-Pac get a little bit ahead of places like Europe where you had, for instance, a lot of long-term data center contracts and you had customers really locked in. And we saw this over the next, let's say between, like, say, 2014 and 2018 when I was working with a systems integrator, and then started working on cloud security.We saw that US customers and Asia-Pac customers didn't have these obligations; European customers, a lot of them were still working off their lease, and still, you know, I'm locked into let's just say Equinix Frankfurt for another five years before I can think about cloud migration. So, that's definitely one aspect that I observed. Second thing I think is, like, the earlier you started, the earlier you reached the point where you realize that actually there is value in a lot of managed services and there actually is value in getting away from the kind of server mindset around EC2.Corey: It feels like there's a lot of, I want to call it legacy thinking, in some ways, except that's unfair because legacy remains a condescending engineering term for something that makes money. The problem that you have is that you get bound by choices you didn't necessarily realize you were making, and then something becomes revenue-bearing. And now there's a different way to do it, or you learn more about the platform, or the platform itself evolves, and, “Oh, I'm going to rewrite everything to take advantage of this,” isn't happening. So, it winds up feeling like, yeah, we're treating the cloud like a data center. And sometimes that's right; sometimes that's a problem, but ultimately, it still becomes a significant challenge. I mean, there's no way around it. And I don't know what the right answer is, I don't know what the fix is going to be, but it always feels like I'm doing something wrong somewhere.Jeremy: I think a lot of customers go through that same set of feelings and they realize that they have the active runway problem, where you know, how do you do maintenance on an active runway? You kind of can't because you've got flights going in and out. And I think you're seeing this in your part of the world at SFO with a lot of the work that got done in, like, 2018, 2019 where they kind of had to close down a runway and had, like, near misses because they consolidated all flights onto the one active runway, right? It is a challenge. And I actually think that some of the evolution that I've seen our customers go through over the last, like, two, three years, is starting to get away from that challenge.So, to your point, when you have revenue-bearing workloads that you can't really modify and things are pretty tightly coupled, it is very hard to make change. But when you start to have it where things are broken down into more microservices, it makes it a lot easier to cycle out Service A for Service B, or let's say more accurately, Service A1 with Service A2 where you can kind of just, like, plug and play different APIs, and maybe, you know, repoint services at the new stuff as they come online. But getting to that point is definitely a painful process. It does require architectural changes and often those architectural changes aren't at the infrastructure level; they're actually inside the application or they're between things like applications and third-party dependencies where the customers may not have full control over the dependencies, and that does become a real challenge for people to break down and start to attack. You've heard of the Strangler Methodology?Corey: Oh, yes. Both in terms of the Boston Strangler, as well—Jeremy: [laugh]. Right.Corey: As the Strangler design pattern.Jeremy: Yeah, yeah. But I think, like, getting to that is challenging until, like, once you understand that you want to do that, it makes a lot of sense. But getting to the starting point for that journey can be really challenging for a lot of customers because it involves stakeholders that are often not involved on infrastructure conversations, and organizational dysfunction can really creep in there, where you have teams that don't necessarily play nice together, not for any particular reason, but just because historically they haven't had to. So, that's something that I've seen and definitely takes a little bit of cultural work to overcome.Corey: When you take a look across the board of cloud adoption, it's interesting to have seen the patterns that wound up unfolding. Your career path, though, seem to have gotten away from the selling cloud and into some strange directions leading up to what you're doing now, where you founded Firetail. What do you folks do?Jeremy: We do API security. And it really is kind of the culmination of, like, the last several years and what we saw. I mean, to your point, we saw customers going through kind of Phase One, Two, Three of cloud adoption. Phase One, the, you know, for lack of a better phrase, lift-and-shift and Phase Two, the kind of first step on the path towards quote-unquote, “Enlightenment,” where they start to see that, like, actually, we can get better operational efficiency if we, you know, move our databases off of EC2 and on to RDS and we move our static content onto S3.And then Phase Three, where they realize actually EC2 kind of sucks, and it's a lot of management overhead, it's a lot of attack surface, I hate having to bake AMIs. What I really want to do is just drop some code on a platform and run my application. And that might be serverless. That might be containerized, et cetera. But one path or the other, where we pretty much always see customers ending up is with an API sitting on a network.And that API is doing two things. It front-ends a data set and at front-ends a set of functionality, and most cases. And so, what that really means is that the thing that sits on the network that does represent the attack surface, both in terms of accessing data or in terms of let's say, like, abusing an application is an API. And that's what led us to where I am today, what led me and my co-founder Riley to, you know, start the company and try to make it easier for customers to build more secure APIs. So yeah, that's kind of the change that I've observed over the last few years that really, as you said, lead to what I'm doing now.Corey: There is a lot of, I guess, challenge in the entire space when we bound that to—even API security, though as soon as you going down the security path it starts seeming like there's a massive problem, just in terms of proliferation of companies that each do different things, that each focus on different parts of the story. It feels like everything winds up spitting out huge amounts of security-focused, or at least security-adjacent telemetry. Everything has findings on top of that, and at least in the AWS universe, “Oh, we have a service that spits out a lot of that stuff. We're going to launch another service on top of it that, of course, cost more money that then winds up organizing it for you. And then another service on top of that that does the same thing yet again.” And it feels like we're building a tower of these things that are just… shouldn't just be a feature in the original underlying thing that turns down the noise? “Well, yes, but then we couldn't sell you three more things around it.”Jeremy: Yeah, I mean—Corey: Agree? Disagree?Jeremy: I don't entirely disagree. I think there is a lot of validity on what you just said there. I mean, if you look at like the proliferation of even the security services, and you see GuardDuty and Config and Security Hub, or things like log analysis with Athena or log analysis with an ELK stack, or OpenSearch, et cetera, I mean, you see all these proliferation of services around that. I do think the thing to bear in mind is that for most customers, like, security is not a one size fits all. Security is fundamentally kind of a risk management exercise, right? If it wasn't a risk management exercise, then all security would really be about is, like, keeping your data off of networks and making sure that, like, none of your data could ever leave.But that's not how companies work. They do interact with the outside world and so then you kind of always have this decision and this trade-off to make about how much data you expose. And so, when you have that decision, then it leads you down a path of determining what data is important to your organization and what would be most critical if it were breached. And so, the point of all of that is honestly that, like, security is not the same for you as it is for me, right? And so, to that end, you might be all about Security Hub, and Config instead of basic checks across all your accounts and all your active regions, and I might be much more about, let's say I'm quote-unquote, “Digital-native, cloud-native,” blah, blah, blah, I really care about detection and response on top of events.And so, I only care about log aggregation and, let's say, GuardDuty or Athena analysis on top of that because I feel like I've got all of my security configurations in Infrastructure as Code. So, there's not a right and wrong answer and I do think that's part of why there are a gazillion security services out there.Corey: On some level, I've been of the opinion for a while now that the cloud providers themselves should not necessarily be selling security services directly because, on some level, that becomes an inherent conflict of interest. Why make the underlying platform more secure or easier to use from a security standpoint when you can now turn that into a revenue source? I used to make comments that Microsoft Defender was a classic example of getting this right because they didn't charge for it and a bunch of antivirus companies screamed and whined about it. And then of course, Microsoft's like, “Oh, Corey saying nice things about us. We can't have that.” And they started charging for it. So okay, that more or less completely subverts my entire point. But it still feels squicky.Jeremy: I mean, I kind of doubt that's why they started charging for it. But—Corey: Oh, I refuse to accept that I'm not that influential. There we are.Jeremy: [laugh]. Fair enough.Corey: Yeah, I just can't get away from the idea that it feels squicky when the company providing the infrastructure now makes doing the secure thing on top of it into an investment decision.Jeremy: Yeah.Corey: “Do you want the crappy, insecure version of what we build or do you want the top-of-the-line secure version?” That shouldn't be a choice people have to make. Because people don't care about security until right after they really should have cared about security.Jeremy: Yeah. Look, and I think the changes to S3 configuration, for instance, kind of bear out your point. Like, it shouldn't be the case that you have to go through a lot of extra steps to not make your S3 data public, it should always be the case that, like, you have to go through a lot of steps if you want to expose your data. And then you have explicitly made a set of choices on your own to make some data public, right? So, I kind of agree with the underlying logic. I think the counterargument, if there is one to be made, is that it's not up to them to define what is and is not right for your organization.Because again, going back to my example, what is secure for you may not be secure for me because we might have very different modes of operation, we might have very different modes of building our infrastructure, deploying our infrastructure, et cetera. And I think every cloud provider would tell you, “Hey, we're just here to enable customers.” Now, do I think that they could be doing more? Do I think that they could have more secure defaults? You know, in general, yes, of course, they could. And really, like, the fundamentals of what I worry about are people building insecure applications, not so much people deploying infrastructure with bad configurations.Corey: It's funny, we talk about this now. Earlier today, I was lamenting some of the detritus from some of my earlier builds, where I've been running some of these things in my old legacy single account for a while now. And the build service is dramatically overscoped, just because trying to get the security permissions right, was an exercise in frustration at the time. It was, “Nope, that's not it. Nope, blocked again.”So, I finally said to hell with it, overscope it massively, and then with a, “Todo: fix this later,” which of course, never happened. And if there's ever a breach on something like that, I know that I'll have AWS wagging its finger at me and talking about the shared responsibility model, but it's really kind of a disaster plan of their own making because there's not a great way to say easily and explicitly—or honestly, by default the way Google Cloud does—of okay, by default, everything in this project can talk to everything in this project, but the outside world can't talk to any of it, which I think is where a lot of people start off. And the security purists love to say, “That's terrible. That won't work at a bank.” You're right, it won't, but a bank has a dedicated security apparatus, internally. They can address those things, whereas your individual student learner does not. And that's how you wind up with open S3 bucket monstrosities left and right.Jeremy: I think a lot of security fundamentalists would say that what you just described about that Google project structure, defeats zero trust, and you know, that on its own is actually a bad thing. I might counterargue and say that, like, hey, you can have a GCP project as a zero trust, like, first principle, you know? That can be the building block of zero trust for your organization and then it's up to you to explicitly create these trust relationships to other projects, and so on. But the thing that I think in what you said that really kind of does resonate with me in particular as an area that AWS—and really this case, just AWS—should have done better or should do better, is IAM permissions. Because every developer in the world that I know has had that exact experience that you described, which is, they get to a point where they're like, “Okay, this thing isn't working. It's probably something with IAM.”And then they try one thing, two things, and usually on the third or fourth try, they end up with a star permission, and maybe a comment in that IAM policy or maybe a Jira ticket that, you know, gets filed into backlog of, “Review those permissions at some point in the future,” which pretty much never happens. So, IAM in particular, I think, is one where, like, Amazon should do better, or should at least make it, like, easy for us to kind of graphically build an IAM policy that is scoped to least permissions required, et cetera. That one, I'll a hundred percent agree with your comments and your statement.Corey: As you take a look across the largest, I guess, environments you see, and as well as some of the folks who are just getting started in this space, it feels like, on some level, it's two different universes. Do you see points of commonality? Do you see that there is an opportunity to get the individual learner who's just starting on their cloud journey to do things that make sense without breaking the bank that they then can basically have instilled in them as they start scaling up as they enter corporate environments where security budgets are different orders of magnitude? Because it seems to me that my options for everything that I've looked at start at tens of thousands of dollars a year, or are a bunch of crappy things I find on GitHub somewhere. And it feels like there should be something between those two.Jeremy: In terms of training, or in terms of, like, tooling to build—Corey: In terms of security software across the board, which I know—Jeremy: Yeah.Corey: —is sort of a vague term. Like, I first discovered this when trying to find something to make sense of CloudTrail logs. It was a bunch of sketchy things off GitHub or a bunch of very expensive products. Same thing with VPC flow logs, same thing with trying to parse other security alerting and aggregate things in a sensible way. Like, very often it's, oh, there's a few very damning log lines surrounded by a million lines of nonsense that no one's going to look through. It's the needle in a haystack problem.Jeremy: Yeah, well, I'm really sorry if you spent much time trying to analyze VPC flow logs because that is just an exercise in futility. First of all, the level of information that's in them is pretty useless, and the SLA on actually, like, log delivery, A, whether it'll actually happen, and B, whether it will happen in a timely fashion is just pretty much non-existent. So—Corey: Oh, from a security perspective I agree wholeheartedly, but remember, I'm coming from a billing perspective, where it's—Jeremy: Ah, fair enough.Corey: —huh, we're taking a petabyte in and moving 300 petabytes between availability zones. It's great. It's a fun game called find whatever is chatty because, on some level, it's like, run two of whatever that is—or three—rather than having it replicate. What is the deal here? And just try to identify, especially in the godforsaken hellscape that is Kubernetes, what is that thing that's talking? And sometimes flow logs are the only real tool you've got, other than oral freaking tradition.Jeremy: But God forbid you forgot to tag your [ENI 00:24:53] so that the flow log can actually be attributed to, you know, what workload is responsible for it behind the scenes. And so yeah, I mean, I think that's a—boy that's a case study and, like, a miserable job that I don't think anybody would really want to have in this day and age.Corey: The timing of this is apt. I sent out my newsletter for the week a couple hours before this recording, and in the bottom section, I asked anyone who's got an interesting solution for solving what's talking to what with VPC flow logs, please let me know because I found this original thing that AWS put up as part of their workshops and a lab to figure this out, but other than that, it's more or less guess-and-check. What is the hotness? It's been a while since I explored the landscape. And now we see if the audience is helpful or disappoints me. It's all on you folks.Jeremy: Isn't the hotness to segregate every microservice into an account and run it through a load balancer so that it's like much more properly tagged and it's also consumable on an account-by-account basis for better attribution?Corey: And then everything you see winds up incurring a direct fee when passing through that load balancer, instead of the same thing within the same subnet being able to talk to one another for free.Jeremy: Yeah, yeah.Corey: So, at scale—so yes, for visibility, you're absolutely right. From a, I would like to spend less money giving it directly to Amazon, not so much.Jeremy: [unintelligible 00:26:08] spend more money for the joy of attribution of workload?Corey: Not to mention as well that coming into an environment that exists and is scaled out—which is sort of a prerequisite for me going in on a consulting project—and saying, “Oh, you should rebuild everything using serverless and microservice principles,” is a great way to get thrown out of the engagement in the first 20 minutes. Because yes, in theory, anyone can design something great, that works, that solves a problem on a whiteboard, but most of us don't get to throw the old thing away and build fresh. And when we do great, I'm greenfielding something; there's always constraints and challenges down the road that you don't see coming. So, you finally wind up building the most extensible thing in the universe that can handle all these things, and your business dies before you get to MVP because that takes time, energy and effort. There are many more companies that have died due to failure to find product-market fit than have died because, “Oh hey, your software architecture was terrible.” If you hit the market correctly, there is budget to fix these things down the road, whereas your code could be pristine and your company's still dead.Jeremy: Yeah. I don't really have a solution for you on that one, Corey [laugh].Corey: [laugh].Jeremy: I will come back to your one question—Corey: I was hoping you did.Jeremy: Yeah, sorry. I will come back to the question about, you know, how should people kind of get started in thinking about assessing security. And you know, to your point, look, I mean, I think Config is a low-ish cost, but should it cost anything? Probably not, at least for, like, basic CIS foundation benchmark checks. I mean, like, if the best practice that Amazon tells everybody is, “Turn on these 40-ish checks at last count,” you know, maybe those 40-ish checks should just be free and included and on in everybody's account for any account that you tag as production, right?Like, I will wholeheartedly agree with that sentiment, and it would be a trivial thing for Amazon to do, with one kind of caveat—and this is something that I think a lot of people don't necessarily understand—collecting all the required data for security is actually really expensive. Security is an extremely data-intensive thing at this day and age. And I have a former coworker who used to hate the expression that security is data science, but there is some truth in it at this point, other than the kind of the magic around it is not actually that big because there's not a lot of, let's say, heuristic analysis or magic that goes into what queries, et cetera. A lot of security is very rule-based. It's a lot of, you know, just binary checks: is this bit set to zero or one?And some of those things are like relatively simple, but what ends up inevitably happening is that customers want more out of it. They don't just want to know, is my security good or bad? They want to know things like is it good or bad now relative to last week? Has it gotten better or worse over time? And so, then you start accumulating lots of data and time series data, and that becomes really expensive.And secondarily, the thing that's really starting to happen more and more in the security world is correlation of multiple layers of data, infrastructure with applications, infrastructure with operating system, infrastructure with OS and app vulnerabilities, infrastructure plus vulnerabilities plus Kubernetes configurations plus API sitting at the edge of that. Because realistically, like, so many organizations that are built out at scale, the truth of the matter is, is just like on their operating system vulnerabilities, they're going to have tens of thousands, if not millions of individual items to deal with and no human can realistically prioritize those without some context around it. And that is where the data, kind of, management becomes really expensive.Corey: I hear you. Particularly the complaints about AWS Config, which many things like Control Tower setup for you. And on some level, it is a tax on using the cloud as the cloud should be used because it charges for evaluation of changes to your environment. So, if you're spinning things up all the time and then turning them down when they're not in use, that incurs a bunch of Config charges, whereas if you've treat it like a big dumb version of your data center where you just spin [unintelligible 00:30:13] things forever, your Config charge is nice and low. When you start seeing it entering the top ten of your spend on services, something is very wrong somewhere.Jeremy: Yeah. I would actually say, like, a good compromise in my mind would be that we should be included with something like business support. If you pay for support with AWS, why not include Config, or some level of Config, for all the accounts that are in scope for your production support? That would seem like a very reasonable compromise.Corey: For a lot of folks that have it enabled but they don't see any direct value from it either, so it's one of those things where not knowing how to turn it off becomes a tax on what you're doing, in some cases. In SCPs, but often with Control Tower don't allow you to do that. So, it's your training people who are learning this in their test environments to avoid it, but you want them to be using it at scale in an enterprise environment. So, I agree with you, there has to be a better way to deliver that value to customers. Because, yeah, this thing is now, you know, 3 or 4% of your cloud bill, it's not adding that much value, folks.Jeremy: Yeah, one thing I will say just on that point, and, like, it's a super small semantic nitpick that I have, I hate when people talk about security as a tax because I think it tends to kind of engender the wrong types of relationships to security. Because if you think about taxes, two things about them, I mean, one is that they're kind of prescribed for you, and so in some sense, this kind of Control Tower implementation is similar because, like you know, it's hard for you to turn off, et cetera, but on the other hand, like, you don't get to choose how that tax money is spent. And really, like, you get to set your security budget as an organization. Maybe this Control Tower Config scenario is a slight outlier on that side, but you know, there are ways to turn it off, et cetera.The other thing, though, is that, like, people tend to relate to tax, like, this thing that they really, really hate. It comes once a year, you should really do everything you can to minimize it and to, like, not spend any time on it or on getting it right. And in fact, like, there's a lot of people who kind of like to cheat on taxes, right? And so, like, you don't really want people to have that kind of mindset of, like, pay as little as possible, spend as little time as possible, and yes, let's cheat on it. Like, that's not how I hope people are addressing security in their cloud environments.Corey: I agree wholeheartedly, but if you have a service like Config, for example—that's what we're talking about—and it isn't adding value to you, and you just you don't know what it does, how it works, than it [unintelligible 00:32:37]—or more or less how to turn it off, then it does effectively become directly in line of a tax, regardless of how people want to view the principle of taxation. It's a—yeah, security should not be a tax. I agree with you wholeheartedly. The problem is, is it is—Jeremy: It should be an enabler.Corey: —unclea—yeah, the relationship between Config and security in many cases is fairly attenuated in a lot of people's minds.Jeremy: Yeah. I mean, I think if you don't have, kind of, ideas in mind for how you want to use it or consume it, or how you want to use it, let's say as an assessment against your own environment, then it's particularly vexing. So, if you don't know, like, “Hey, I'm going to use Config. I'm going to use Config for this set of rules. This is how I'm going to consume that data and how I'm going to then, like, pass the results on to people to make change in the organization,” then it's particularly useless.Corey: Yeah. I really want to thank you for taking the time to speak with me. If people want to learn more, where's the best place for them to find you?Jeremy: Easy, breezy. We are just firetail.io. That's ‘fire' like the, you know, flaming substance, and ‘tail' like the tail of an animal, not like a story. But yeah, just firetail.io.And if you come now, we've actually got, like, a white paper that we just put out around API security and kind of analyzing ten years of API-based data breaches and trying to understand what actually went wrong in most of those cases. And you're more than welcome to grab that off of our website. And if you have any questions, just reach out to me. I'm just jeremy@firetail.io.Corey: And we'll put links to all of that in the [show notes 00:34:03]. Thank you so much for your time. I appreciate it.Jeremy: My pleasure, Corey. Thanks so much for having me.Corey: Jeremy Snyder, founder and CEO at Firetail. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment pointing out that listening to my nonsense is a tax on you going about your day.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Un semaine chargée en nouveautés, un peu plus que d'habitude. Comme chaque semaine, j'ai fais un choix, forcement incomplet et forcément biaisé. J'essaye de sélectionner les nouveautés les plus importantes pour les builders,les développeurs, vous qui concevez, créez et gèrez des applications ou des infrastructures sur AWS. Cette semaine, je parle de sécurité avec Inspector pour Lambda, Detective et on enterre les bastion hosts et les rotation de clé SSH. Sécurité toujours avec CloudTrail Lake Dashboard. Il y a aussi un nouveau service qui permet de gérer vos permissions applicatives dans vos applications et on parlera de double encryption pour S3. Je finirai avec une nouvelle API pour SQS et un projet open-source sympa autour des IA générartives pour migrer vos workload Stable Diffusion vers EC2 et Sagemaker. On détaille tout cela dans le podcast
Un semaine chargée en nouveautés, un peu plus que d'habitude. Comme chaque semaine, j'ai fais un choix, forcement incomplet et forcément biaisé. J'essaye de sélectionner les nouveautés les plus importantes pour les builders,les développeurs, vous qui concevez, créez et gèrez des applications ou des infrastructures sur AWS. Cette semaine, je parle de sécurité avec Inspector pour Lambda, Detective et on enterre les bastion hosts et les rotation de clé SSH. Sécurité toujours avec CloudTrail Lake Dashboard. Il y a aussi un nouveau service qui permet de gérer vos permissions applicatives dans vos applications et on parlera de double encryption pour S3. Je finirai avec une nouvelle API pour SQS et un projet open-source sympa autour des IA générartives pour migrer vos workload Stable Diffusion vers EC2 et Sagemaker. On détaille tout cela dans le podcast
Watch on YouTube About the show Sponsored by InfluxDB from Influxdata. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: pystack PyStack is a tool that uses forbidden magic to let you inspect the stack frames of a running Python process or a Python core dump, helping you quickly and easily learn what it's doing. PyStack has the following amazing features:
Mike Brevoort, Chief Product Officer at Gitpod, joins Corey on Screaming in the Cloud to discuss all the intricacies of remote development and how Gitpod is simplifying the process. Mike explains why he feels the infinite resources cloud provides can be overlooked when discussing remote versus local development environments, and how simplifying build abstractions is a fantastic goal, but that focusing on the tools you use in a build abstraction in the meantime can be valuable. Corey and Mike also dive into the security concerns that come with remote development, and Mike reveals the upcoming plans for Gitpod's local conference environment, CDE Universe. About MikeMike has a passion for empowering people to be creative and work together more effectively. He is the Chief Product Officer at Gitpod striving to remove the friction and drudgery from software development through Cloud Developer Environments. He spent the previous four years at Slack where he created Workflow Builder and “Platform 2.0” after his company Missions was acquired by Slack in 2018. Mike lives in Denver, Colorado and enjoys cycling, hiking and being outdoors.Links Referenced: Gitpod: https://www.gitpod.io/ CDE Universe: https://cdeuniverse.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: It's easy to **BEEP** up on AWS. Especially when you're managing your cloud environment on your own!Mission Cloud un **BEEP**s your apps and servers. Whatever you need in AWS, we can do it. Head to missioncloud.com for the AWS expertise you need. Corey: Have you listened to the new season of Traceroute yet? Traceroute is a tech podcast that peels back the layers of the stack to tell the real, human stories about how the inner workings of our digital world affect our lives in ways you may have never thought of before. Listen and follow Traceroute on your favorite platform, or learn more about Traceroute at origins.dev. My thanks to them for sponsoring this ridiculous podcast. Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I have had loud, angry, and admittedly at times uninformed opinions about so many things over the past few years, but something that predates that a lot is my impression on the idea of using remote systems for development work as opposed to doing local dev, and that extends to build and the rest. And my guest today here to argue with me about some of it—or agree; we'll find out—is Mike Brevoort, Chief Product Officer at Gitpod, which I will henceforth be mispronouncing as JIT-pod because that is the type of jerk I am. Mike, thank you for joining me.Mike: Thank you for insulting my company. I appreciate it.Corey: No, by all means, it's what we do here.Mike: [laugh].Corey: So, you clearly have opinions on the idea of remote versus local development that—I am using the word remote development; I know you folks like to use the word cloud, in place of remote, but I'm curious to figure out is, is that just the zeitgeist that has shifted? Do you have a belief that it should be in particular places, done in certain ways, et cetera? Where do your opinion on this start and stop?Mike: I think that—I mean, remote is accurate, an accurate description. I don't like to emphasize the word remote because I don't think it's important that it's remote or local. I think that the term cloud connotes different values around the elasticity of environments and the resources that are more than what you might have on your local machine versus a remote machine. It's not so much whether the one machine is local or remote as much of it is that there are infinite numbers of resources that you can develop across in the cloud. That's why we tend to prefer our cloud development environments.Corey: From my perspective, I've been spending too many years now living in basically hotels and airports. And when I was doing that, for a long time, the only computer I bring with me has been my iPad Pro. That used to be a little bit on the challenging side and these days, that's gotten capable enough where it's no longer interesting in isolation. But there's no local development environment that is worth basically anything on that. So, I've been SSHing into things and using VI as my development environment for many years.When I started off as a grumpy Unix sysadmin, there was something reassuring about the latest state of whatever it is I'm working on lives in a data center somewhere rather than on a laptop, I'm about to leave behind a coffee shop because I'm careless. So, there's a definite value and sense that I am doing something virtuous, historically. But it didn't occur to me till I started talking to people about this, just how contentious the idea was. People would love to ask all kinds of fun objections to this where it was, “Oh, well, what about when you're on a plane and need to do work?” It's, well, I spend an awful lot of time on planes and that is not a limiting factor in me writing the terrible nonsense that I will charitably called code, in my case. I just don't find that that idea holds up anywhere. The world has become so increasingly interconnected that that seems unlikely. But I do live in San Francisco, so here, every internet is generally pretty decent; not every place is. What are your thoughts?Mike: I agree. I mean, I think one thing is, I would just like not to think about it, whether I can or can't develop because I'm connected or not. And I think that we tend to be in a world where that is moreso the case. And I think a lot of times when you're not connected, you become reconnected soon, like if your connection is not reliable or if you're going in and out of connectivity issues. And when you're trying to work on a local laptop and you're connecting and disconnecting, it's not like we develop these days, and everything is just isolated on our local laptop, especially we talk about cloud a lot on this podcast and a lot of apps now go way beyond just I'm running a process on my machine and I'm connecting to data on my machine.There are local emulators you could use for some of these services, but most of them are inferior. And if you're using SQS or using any other, like, cloud-based service, you're usually, as a developer, connecting to some version of that and if you're disconnected anyway, you're not productive either. And so, I find that it's just like an irrelevant conversation in this new world. And that the way we've developed traditionally has not followed along with this view of I need to pile everything in on my laptop, to be able to develop and be productive has not, like, followed along with the trend that moved into the cloud.Corey: Right. The big problem for a long time has been, how do I make this Mac or Windows laptop look a lot like Linux EC2 instance? And there have been a bunch of challenges and incompatibility issues and the rest, and from my perspective, I like to develop in an environment that at least vaguely resembles the production environment it's going to run in, which in AWS's case, of course, comes down to expensive. Bu-dum-tss.Mike: Yeah, it's a really big challenge. It's been a challenge, right? When you've worked with coworkers that were on a Windows machine and you were on a Mac machine, and you had the one person on their Linux machine forever, and we all struggled with trying to mimic these development environments that were representative, ultimately, of what we would run in production. And if you're counting costs, we can count the cost of those cloud resources, we can count the cost of those laptops, but we also need to count the cost of the people who are using those laptops and how inefficient and how much churn they have, and how… I don't know, there was for years of my career, someone would show up every morning to the stand-up meeting and say, it's like, “Well, I wasted all afternoon yesterday trying to work out my, you know, issues with my development environment.” And it's, like, “I hope I get that sorted out later today and I hope someone can help me.”And so, I think cost is one thing. I think that there's a lot of inconsistencies that lead to a lot of inefficiencies and churn. And I think that, regardless of where you're developing, the more that you can make your environments more consistent and sound, not for you, but for your own team and have those be more representative of what you are running in production, the better.Corey: We should disambiguate here because I fear this is one of the areas where my use case tends to veer off into the trees, which is I tend to operate largely in isolation, from a development point of view. I build small, micro things that wind up doing one thing, poorly. And that is, like, what I do is a proof of concept, or to be funny, or to kick the tires on a new technology. I'll also run a bunch of random things I find off of JIF-ub—yes, that's how I pronounce GitHub. And that's great, but it also feels like I'm learning as a result, every stack, and every language, in every various version that it has, and very few of the cloud development environments that I've seen, really seems to cater to the idea that simultaneously, I want to have certain affordances in my shell environment set up the way that I want them, tab complete this particular suite of tools generically across the board, but then reset to that baseline and go in a bunch of different directions of, today, it's Python in this version and tomorrow, it's Node in this other version, and three, what is a Typescript anyway, and so on and so forth.It feels like it's either, in most cases, you either get this generic, one-size-fits-everyone in this company, for this project, approach, or it's, here's a very baseline untuned thing that does not have any of your dependencies installed. Start from scratch every time. And it's like, feels like there are two paths, and they both suck. Where are you folks at these days on that spectrum?Mike: Yeah, I think that, you know, one, if you do all of that development across all these different libraries and technology stacks and you're downloading all these repos from JIF-hub—I say it right—and you're experimenting, you tend to have a lot of just collision of things. Like if you're using Python, it's, like, really a pain to maintain isolation across projects and not have—like, your environment is, like, one big bucket of things on your laptop and it's very easy to get that into a state where things aren't working, and then you're struggling. There's no big reset on your laptop. I mean, there is but it takes—it's a full reset of everything that you have.And I think the thing that's interesting to me about cloud development environments is I could spin one of these up, I could trash it to all hell and just throw it away and get another one. And I could get another one of those at a base of which has been tuned for whatever project or technology I'm working on. So, I could take—you know, do the effort to pre-setup environments, one that is set up with all of my, like, Python tooling, and another one that's set up with all my, like, Go or Rust tooling, or our front-end development, even as a base repo for what I tend to do or might tend to experiment with. What we find is that, whether you're working alone or you're working with coworkers, that setting up a project and all the resources and the modules and the libraries and the dependencies that you have, like, someone has to do that work to wire that up together and the fact that you could just get an environment and get another one and another one, we use this analogy of, like, tissue boxes where, like, you should just be able to pull a new dev environment out of a tissue box and use it and throw it away and pull as many tissues out of the box as you want. And they should be, like, cheap and ephemeral because—and they shouldn't be long-lived because they shouldn't be able to drift.And whether you're working alone or you're working in a team, it's the same value. The fact that, like, I could pull on these out, I have it. I'm confident in it of what I got. Like for example, ideally, you would just start a dev environment, it's available instantly, and you're ready to code. You're in this project with—and maybe it's a project you've never developed on. Maybe it's an open-source project.This is where I think it really improves the sort of equitability of being able to develop, whether it's in open-source, whether it's inner-source in companies, being able to approach any project with a click of a button and get the same environment that the tech lead on the project who started it five years ago has, and then I don't need to worry about that and I get the same environment. And I think that's the value. And so, whether you're individual or you're on a team, you want to be able to experiment and thrash and do things and be able to throw it away and start over again, and not have to—like for example, maybe you're doing that on your machine and you're working on this thing and then you actually have to do some real work, and then now that you've done something that conflicts with the thing that you're working on and you're just kind of caught in this tangled mess, where it's like, you should just be able to leave that experiment there and just go work on the thing you need to work on. And why can't you have multiples of these things at any given time?Corey: Right. One of the things I loved about EC2 dev environments has been that I can just spin stuff up and okay, great, it's time for a new project. Spin up another one and turn it off when I'm done using it—which is the lie we always tell ourselves in cloud and get charged for things we forget to turn off. But then, okay, I need an Intel box one day. Done. Great, awesome. I don't have any of those lying around here anymore but clickety, clickety, and now I do.It's nice being able to have that flexibility, but it's also sometimes disconcerting when I'm trying to figure out what machine I was on when I was building things and the rest, and having unified stories around this becomes super helpful. I'm also finding that my overpowered desktop is far more cost-efficient when I need to compile something challenging, as opposed to finding a big, beefy, EC2 box for that thing as well. So, much of the time, what my remote system is doing is sitting there bored. Even when I'm developing on it, it doesn't take a lot of modern computer resources to basically handle a text editor. Unless it's Emacs, in which case, that's neither here nor there.Mike: [laugh]. I think that the thing that becomes costly, especially when using cloud development environments, is when you have to continue to run them even when you're not using them for the sake of convenience because you're not done with it, you're in the middle of doing some work and it still has to run or you forget to shut it off. If you are going to just spin up a really beefy EC2 instance for an hour to do that big compile and it costs you 78 cents. That's one thing. I mean, I guess that adds up over time and yes, if you've already bought that Mac Studio that's sitting under your desk, humming, it's going to be more cost-efficient to use that thing.But there's, like, an element of convenience here that, like, what if I haven't bought the Mac Studio, but I still need to do that big beefy compilation? And maybe it's not on a project I work on every single day; maybe it's the one that I'm just trying to help out with or just starting to contribute to. And so, I think that we need to get better about, and something that we're very focused on at JIT-pod, is—Gitpod—is—Corey: [laugh]. I'm going to get you in trouble at this rate.Mike: —[laugh]—is really to optimize that underlying runtime environment so that we can optimize the resources that you're using only when you're using it, but also provide a great user experience. Which is, for me, as someone who's responsible for the product at Gitpod, the thing I want to get to is that you never have to think about a machine. You're not thinking about this dev environment as something that lives somewhere, that you're paying for, that there's a meter spinning that if you forget it, that you're like, ah, it's going to cost me a lot of money, that I have to worry about ever losing it. And really, I just want to be able to get a new environment, have one, use it, come back to it when I need it, have it not cost me a lot of money, and be able to have five or ten of those at a time because I'm not as worried about what it's going to cost me. And I'm sure it'll cost something, but the convenience factor of being able to get one instantly and have it and not have to worry about it ultimately saves me a lot of time and aggravation and improves my ability to focus and get work done.And right now, we're still in this mode where we're still thinking about, is it on my laptop? Is it remote? Is it on this EC2 instance or that EC2 instance? Or is this thing started or stopped? And I think we need to move beyond that and be able to just think of these things as development environments that I use and need and they're there when I want to, when I need to work on them, and I don't have to tend to them like cattle.Corey: Speaking of tending large things in herds—I guess that's sort of for the most tortured analogy slash segway I've come up with recently—you folks have a conference coming up soon in San Francisco. What's the deal with that? And I'll point out, it's all on-site, locally, not in the cloud. So, hmm…Mike: Yeah, so we have a local conference environment, a local conference that we're hosting in San Francisco called CDE Universe on June 1st and 2nd, and we are assembling all the thought leaders in the industry who want to get together and talk about where not just cloud development is going, but really where development is going. And so, there's us, there's a lot of companies that have done this themselves. Like, before I joined Gitpod, I was at Slack for four years and I got to see the transition of a, sort of, remote development hosted on EC2 instances transition and how that really empowered our team of hundreds of engineers to be able to contribute and like work together better, more efficiently, to run this giant app that you can't run just alone on your laptop. And so, Slack is going to be there, they're going to be talking about their transition to cloud development. The Uber team is going to be there, there's going to be some other companies.So, Nathan who's building Zed, he was the one that originally built Adam at GitHub is now building Zed, which is a new IDE, is going to be there. And I can't mention all the speakers, but there's going to be a lot of people that are really looking at how do we drive forward development and development environments. And that experience can get a lot better. So, if you're interested in that, if you're going to be in San Francisco on June 1st and 2nd and want to talk to these people, learn from them, and help us drive this vision forward for just a better development experience, come hang out with us.Corey: I'm a big fan of collaborating with folks and figuring out what tricks and tips they've picked up along the way. And this is coming from the perspective of someone who acts as a solo developer in many cases. But it always drove me a little nuts when you see people spending weeks of their lives configuring their text editor—VIM in my case because I'm no better than these people; I am one of them—and getting it all setup and dialed in. It's, how much productivity you gaining versus how much time are you spending getting there?And then when all was said and done a few years ago, I found myself switching to VS Code for most of what I do, and—because it's great—and suddenly the world's shifting on its axis again. At some point, you want to get away from focusing on productivity on an individualized basis. Now, the rules change when you're talking about large teams where everyone needs a copy of this running locally or in their dev environment, wherever happens to be, and you're right, often the first two weeks of a new software engineering job are, you're now responsible for updating the onboarding docs because it's been ten minutes since the last time someone went through it. And oh, the versions bumped again of what we would have [unintelligible 00:16:44] brew install on a Mac and suddenly everything's broken. Yay. I don't miss those days.Mike: Yeah, the new, like, ARM-based Macs came out and then you were—now all of a sudden, all your builds are broken. We hear that a lot.Corey: Oh, what I love now is that, in many cases, I'm still in a process of, okay, I'm developing locally on an ARM-based Mac and I'm deploying it to a Graviton2-based Lambda or instance, but the CI/CD builder is going to run on Intel, so it's one of those, what is going on here? Like, there's a toolchain lag of round embracing ARM as an architecture. That's mostly been taken care of as things have evolved, but it's gotten pretty amusing at some point, just as quickly that baseline architecture has shifted for some workloads. And for some companies.Mike: Yeah, and things just seem to be getting more [laugh] and more complicated not less complicated, and so I think the more that we can—Corey: Oh, you noticed?Mike: Try to simplify build abstractions [laugh], you know, the better. But I think in those cases where, I think it's actually good for people to struggle with setting up their environment sometime, with caring about the tools that they use and their experience developing. I think there has to be some ROI with that. If it's like a chronic thing that you have to continue to try to fix and make better, it's one thing, but if you spend a whole day improving the tools that you use to make you a better developer later, I think there's a ton of value in that. I think we should care a lot about the tools we use.However, that's not something we want to do every day. I mean, ultimately, I know I don't build software for the sake of building software. I want to create something. I want to create some value, some change in the world. There's some product ultimately that I'm trying to build.And, you know, early on, I've done a lot of work in my career on, like, workflow-type builders and visual builders and I had this incorrect assumption somewhere along the way—and this came around, like, sort of the maker movement, when everybody was talking about everybody should learn how to code, and I made this assumption that everybody really wants to create; everybody wants to be a creator, and if given the opportunity, they will. And I think what I finally learned is that, actually most people don't like to create. A lot of people just want to be served; like, they just want to consume and they don't want the hassle of it. Some people do, if they have the opportunity and the skillsets, too, but it's also similar to, like, if I'm a professional developer, I need to get my work done. I'm not measured on how well my local tooling is set up; I'm sort of measured on my output and the impact that I have in the organization.I tend to think about, like, chefs. If I'm a chef and I work 60 hours in a restaurant, 70 hours in a restaurant, the last thing I want to do is come home and cook myself a meal. And most of the chefs I know actually don't have really nice kitchens at home. They, like, tend to, they want other people to cook for them. And so, I think, like, there's a place in professional setting where you just need to get the work done and you don't want to worry about all the meta things and the time that you could waste on it.And so, I feel like there's a happy medium there. I think it's good for people to care about the tools that they use the environment that they develop in, to really care for that and to curate it and make it better, but there's got to be some ROI and it's got to have value to you. You have to enjoy that. Otherwise, you know, what's the point of it in the first place?Corey: One thing that I used to think about was that if you're working in regulated industries, as I tended to a fair bit, there's something very nice about not having any of the data or IP or anything like that locally. Your laptop effectively just becomes a thin client to something that's already controlled by the existing security and compliance apparatus. That's very nice, where suddenly it's all someone steals my iPad, or I drop it into the bay, it's locked, it's encrypted. Cool, I go to the store, get myself a new one, restore a backup from iCloud, and I'm up and running again in a very short period of time as if nothing had ever changed. Whereas when I was doing a lot of local development and had bad hard drive issues in the earlier part of my career, well, there goes that month.Mike: Yeah, it's a really good point. I think that we're all walking around with these laptops with really sensitive IP on it and that those are in bars and restaurants. And maybe your drives are encrypted, but there's a lot of additional risks, including, you know, everything that is going over the network, whether I'm on a local coffee shop, and you know, the latest vulnerability that, an update I have to do on my Mac if I'm behind. And there's actually a lot of risk and having all that just sort of thrown to the wind and spread across the world and there's a lot of value in having that in a very safe place. And what we've even found that, at Gitpod now, like, the latest product we're working on is one that we called Gitpod Dedicated, which gives you the ability to run inside your own cloud perimeter. And we're doing that on AWS first, and so we can set up and manage an installation of Gitpod inside your own AWS account.And the reason that became important to us is that a lot of companies, a lot of our customers, treat their source code as their most sensitive intellectual property. And they won't allow it to leave their perimeter, like, they may run in AWS, but they have this concept of, sort of like, our perimeter and you're either inside of that and outside of it. And I think this speaks a little bit to a blog post that you wrote a few months ago about the lagging adoption of remote development environments. I think one of those aspects is, sort of, convenience and the user experience, but the other is that you can't use them very well with your stack and all the tools and resources that you need to use if they're not running, sort of, close within your perimeter. And so, you know, we're finding that companies have this need to be able to have greater control, and now with the, sort of, trends around, like, coding assistance and generative AI and it's even the perfect storm of not only am I like sending my source code from my editor out into some [LM 00:22:36], but I also have the risk of an LM that might be compromised, that's injecting code and I'm committing on my behalf that may be introducing vulnerabilities. And so, I think, like, getting that off to a secure space that is consistent and sound and can be monitored, to be kept up-to-date, I think it has the ability to, sort of, greatly increase a customer's security posture.Corey: While we're here kicking the beehive, for lack of a better term, your support for multiple editors in Gitpod the product, I assumed that most people would go with VS Code because I tend to see it everywhere, and I couldn't help but notice that neither VI nor Emacs is one of the options, the last time I checked. What are you seeing as far as popularity contests go? And that might be a dangerous question because I'm not suggesting you alienate many of the other vendors who are available, but in the world I live in, it's pretty clear where the zeitgeist of my subculture is going.Mike: Yeah, I mean, VS Code is definitely the most popular IDE. The majority of people that use Gitpod—and especially we have a, like, a pretty heavy free usage tier—uses it in the browser, just for the convenience of having that in the browser and having many environments in the browser. We tend to find more professional developers use VS Code desktop or the JetBrains suite of IDEs.Corey: Yeah, JetBrains I'm seeing a fair bit of in a bunch of different ways and I think that's actually most of what your other options are. I feel like people have either gone down the JetBrains path or they haven't and it seems like it's very, people who are into it are really into it and people who are not are just, never touch it.Mike: Yeah, and we want to provide the options for people to use the tools that they want to use and feel comfortable on. And we also want to provide a platform for the next generation of IDEs to be able to build on and support and to be able to support this concept of cloud or remote development more natively. So, like I mentioned, Nathan Sobo at Zed, I met up with him last week—I'm in Denver; he's in Boulder—and we were talking about this and he's interested in Zed working in the browser, and he's talked about this publicly. And for us, it's really interesting because, like, IDEs working in the browser is, like, a really great convenience. It's not the perfect way to work, necessarily, in all circumstances.There's some challenges with, like, all this tab sprawl and stuff, but it gives us the opportunity, if we can make Zed work really well in for Gitpod—or anybody else building an IDE—for that to work in the browser. Ultimately what we want is that if you want to use a terminal, we want to create a great experience for you for that. And so, we're working on this ability in Gitpod to be able to effectively, like, bring your own IDE, if you're building on that, and to be able to offer it and distribute on Gitpod, to be able to create a new developer tool and make it so that anybody in their Gitpod workspace can launch that as part of their workspace, part of their tool. And we want to see developer tools and IDEs flourish on top of this platform that is cloud development because we want to give people choice. Like, at Gitpod, we're not building our own IDE anymore.The team started to. They created Theia, which was one of the original cloud, sort of, web-based IDEs that now has been handed over to the Eclipse Foundation. But we moved to VS Code because we found that that's where the ecosystem were. That's where our users were, and our customers, and what they wanted to use. But we want to expand beyond that and give people the ability to choose, not only the options that are available today but the options that should be available in the future. And we think that choice is really important.Corey: When you see people kicking the tires on Gitpod for the first time, where does the bulk of their hesitancy come from? Like, what is it where—people, in my experience, don't love to embrace change. So, it's always this thing, “This thing sucks,” is sort of the default response to anything that requires them to change their philosophy on something. So okay, great. That is a thing that happens. We'll see what people say or do. But are they basing it on anything beyond just familiarity and comfort with the old way of doing things or are there certain areas that you're finding the new customers are having a hard time wrapping their head around?Mike: There's a couple of things. I think one thing is just habit. People have habits and preferences, which are really valuable because it's the way that they've learned to be successful in their careers and the way that they expect things. Sometimes people have these preferences that are fairly well ingrained that maybe are irrational or rational. And so, one thing is just people's force of habit.And then getting used to this idea that if it's not on my laptop, it means—like what you mentioned before, it's always what-ifs of, like, “What if I'm on a plane?” Or like, “What if I'm at the airport in a hurricane?” “What if I'm on a train with a spotty internet connection?” And so, there's all these sort of what-if situations. And once people get past that and they start actually using Gitpod and trying to set their projects up, the other limiting factor we have is just connectivity.And that's, like, connectivity to the other resources that you use to develop. So, whether that's, you know, package or module repositories or that some internal services or a database that might be running behind a firewall, it's like getting connectivity to those things. And that's where the dedicated deployment model that I talked about, running inside of your perimeter on our network, they have control over, kind of helps, and that's why we're trying to overcome that. Or if you're using our SaaS product, using something like Tailscale or a more modern VPN that way. But those are the two main things.It's like familiarity, this comfort for how to work, sort of, in this new world and not having this level of comfort of, like, it's running on this thing I can hold, as well as connectivity. And then there is some cost associated with people now paying for this infrastructure they didn't have to pay for before. And I think it's a, you know, it's a mistake to say that we're going to offset the cost of laptops. Like, that shouldn't be how you justify a cloud development environment. Like—Corey: Yeah, I feel like people are not requesting under-specced laptops much these days anymore.Mike: It's just like, I want to use a good laptop; I want to use a really nice laptop with good hardware and that shouldn't be the cost. The proposition shouldn't be, it's like, “Save a thousand dollars on every developer's laptop by moving this off to the cloud.” It's really the time savings. It's the focus. It's the, you know, removing all of that drift and creating these consistent environments that are more secure, and effectively, like, automating your development environment that's the same for everybody.But that's the—I think habits are the big thing. And there is, you know, I talked about a little bit that element of, like, we still have this concept of, like, I have this environment and I start it and it's there, and I pay for it while it's there and I have to clean it up or I have to make sure it stopped. I think that still exists and it creates a lot of sort of cognitive overhead of things that I have to manage that I didn't have to manage before. And I think that we have to—Gitpod needs to be better there and so does everybody else in the industry—about removing that completely. Like, there's one of the things that I really love that I learned from, like, Stewart Butterfield when I was at Slack was, he always brought up this concept called the convenience threshold.And it was just the idea that when a certain threshold of convenience is met, people's behavior suddenly changes. And as we thought about products and, like, the availability of features, that it really drove how we thought about even how to think about you know, adoption or, like, what is the threshold, what would it take? And, like, a good example of this is even, like, the way we just use credit cards now or debit cards to pay for things all the time, where we're used to carry cash. And in the beginning, when it was kind of novel that you could use a credit card to pay for things, like even pay for gas, you always had to have cash because you didn't know if it'd be accepted. And so, you still had to have cash, you still had to have it on hand, you still had to get it from the ATM, you still have to worry about, like, what if I get there and they don't accept my cards and how much money is it going to be, so I need to make sure I have enough of it.But the convenience of having this card where I don't have to carry cash is I don't have to worry about that anymore, as long as they have money in my bank account. And it wasn't until those cards were accepted more broadly that I could actually rely on having that card and not having the cash. It's similar when it comes to cloud development environments. It needs to be more convenient than my local development environment. It needs to be—it's kind of like early—I remember when laptops became more common, I was used to developing on a desktop, and people were like, nobody's ever going to develop on a laptop, it's not powerful enough, the battery runs out, I have to you know, when I close the lid, when you open the lid, it used to take, like, five minutes before, like, it would resume an unhibernate and stuff, and it was amazing where you could just close it and open it and get back to where you were.But like, that's the case where, like, laptops weren't convenient as desktops were because they were always plugged in, powered on, you can leave them and you can effectively just come back and sit down and pick up where you left off. And so, I think that this is another moment where we need to make these cloud development environments more convenient to be able to use and ultimately better. And part of that convenience is to make it so that you don't have to think about all these parts of them of whether they're running, not running, how much they cost, whether you're going to be there [unintelligible 00:31:35] or lose their data. Like, that should be the value of it that I don't have to think about any of that stuff.Corey: So, my last question for you is, when you take a look at people who have migrated to using Gitpod, specifically from the corporate perspective, what are their realizations after the fact—I mean, assuming they still take your phone calls because that's sort of feedback of a different sort—but what have they realized has worked well? What keeps them happy and coming back and taking your calls?Mike: Yeah, our customers could focus on their business instead of focusing on all the issues that they have with configuring development environments, everything that could go wrong. And so, a good example of this is a customer they have, Quizlet, Quizlet saw a 45-point increase in developer satisfaction and a 60% reduction in incidents, and the time that it takes to onboard new engineers went down to ten minutes. So, we have some customers that we talk to that come to us and say, “It takes us 20 days to onboard an engineer because of all the access they need and everything you need to set up and credentials and things, and now we could boil that down to a button click.” And that's the thing that we tend to hear from people is that, like, they just don't have to worry about this anymore and they tend to be able to focus on their business and what the developers are actually trying to do, which is build their product.And in Quizlet's example, it was really cool to see them mention in one of the recent OpenAI announcements around GPT4 and plugins is they were one of the early customers that built GPT4 plugins, or ChatGPT, and they mentioned that they were sharing a lot of Gitpod URLs around when we reached out to congratulate them. And the thing that was great about that, for us is, like, they were talking about their business and what they were developing and how they were being successful. And we'd rather see Gitpod in your development environment just sort of disappear into the background. We'd actually like to not hear from customers because it's just working so well from them. So, that's what we found is that customers are just able to get to this point where they could just focus on their business and focus on what they're trying to develop and focus on making their customers successful and not have to worry about infrastructure for development.Corey: I think that really says it all. On some level, when you have customers who are happy with what's happening and how they're approaching this, that really is the best marketing story I can think of because you can say anything you want about it, but when customers will go out and say, “Yeah, this has made our lives better; please keep doing what you're doing,” it counts for a lot.Mike: Yeah, I agree. And that's what we're trying to do. You know, we're not trying to win, sort of, a tab versus spaces debate here around local or cloud or—I actually just want to enable customers to be able to do their work of their business and develop software better. We want to try to provide a method and a platform that's extensible and customizable and gives them all the power they need to be able to just be ready to code, to get to work as soon as they can.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where's the best place for them to find you, other than at your conference in San Francisco in a few weeks?Mike: [laugh]. Yeah, thank you. I really appreciate the banter back and forth. And I hope to see you there at our conference. You should come. Consider this an invite for June 1st and 2nd in San Francisco at CDE Universe.Corey: Of course. And we will put links to this in the [show notes 00:34:53]. Thank you so much for being so generous with your time. I appreciate it.Mike: Thanks, Corey. That was really fun.Corey: Mike Brevoort, Chief Product Officer at Gitpod. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment detailing exactly why cloud development is not the future, but then lose your content halfway through because your hard drive crashed.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Avec la participation d'une intervenante psychosociale et d'un paire-aidant de la Société québécoise de la schizophrénie et des psychoses apparentées (SQS) nous répondons à plusieurs questions entourant la schizophrénie. Qui est atteint? C'est quoi cettte maladie? Les symptômes positifs et négatifs? Les traitements? Etc. Hosted on Acast. See acast.com/privacy for more information.
An airhacks.fm conversation with Maximilian Schellhorn (@maschnetwork) about: playing Halo with Fujitsu Siemens Scaleo, amazing graphics with crytec and crysis, learning HTML and the great marquee tag, semi-professional Call of Duty 4 gaming, learning Delphi and GUI programming, button oriented programming in Delphi, building ski school software in Delphi, from Delphi to Java and Spring, learning patterns with Java, starting at cloudflight.io, from Zalando to AWS, starting at AWS as Solution Architect, deploying quarkus as AWS Lambda, full stack Infrastructure as Code with Java, creating Java content on AWS, SAM, CloudFormation and CDK, CDK with SAM CLI, declarative development with SAM, state management and IaC, AWS Serverless Java Container, Lambda SnapStart optimises the startup time, SnapStart snapshots, using CRaC hooks for SnapShot, caching with ShapStart vs. PostConstruct, from Kotlin to Java 17, data classes in Kotlin vs. Java Records, sealed classes as error handlers, serverless on-premise and in the clouds, using S3 and DynamoDB, DynamoDB and IaM security, lift-and-shift, lambda SQS integration, Java on AWS Lambda Workshop, Caching Data in the Snapshot, announcement: AWS Lambda adds support for Java 17 Maximilian Schellhorn on twitter: @maschnetwork
Waldemar Hummer, Co-Founder & CTO of LocalStack, joins Corey on Screaming in the Cloud to discuss how LocalStack changed Corey's mind on the futility of mocking clouds locally. Waldemar reveals why LocalStack appeals to both enterprise companies and digital nomads, and explains how both see improvements in their cost predictability as a result. Waldemar also discusses how LocalStack is an open-source company first and foremost, and how they're working with their community to evolve their licensing model. Corey and Waldemar chat about the rising demand for esoteric services, and Waldemar explains how accommodating that has led to an increase of adoption from the big data space. About WaldemarWaldemar is Co-Founder and CTO of LocalStack, where he and his team are building the world-leading platform for local cloud development, based on the hugely popular open source framework with 45k+ stars on Github. Prior to founding LocalStack, Waldemar has held several engineering and management roles at startups as well as large international companies, including Atlassian (Sydney), IBM (New York), and Zurich Insurance. He holds a PhD in Computer Science from TU Vienna.Links Referenced: LocalStack website: https://localstack.cloud/ LocalStack Slack channel: https://slack.localstack.cloud LocalStack Discourse forum: https://discuss.localstack.cloud LocalStack GitHub repository: https://github.com/localstack/localstack TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Until a bit over a year ago or so, I had a loud and some would say fairly obnoxious opinion around the futility of mocking cloud services locally. This is not to be confused with mocking cloud services on the internet, which is what I do in lieu of having a real personality. And then one day I stopped espousing that opinion, or frankly, any opinion at all. And I'm glad to be able to talk at long last about why that is. My guest today is Waldemar Hummer, CTO and co-founder at LocalStack. Waldemar, it is great to talk to you.Waldemar: Hey, Corey. It's so great to be on the show. Thank you so much for having me. We're big fans of what you do at The Duckbill Group and Last Week in AWS. So really, you know, glad to be here with you today and have this conversation.Corey: It is not uncommon for me to have strong opinions that I espouse—politely to be clear; I'll make fun of companies and not people as a general rule—but sometimes I find that I've not seen the full picture and I no longer stand by an opinion I once held. And you're one of my favorite examples of this because, over the course of a 45-minute call with you and one of your business partners, I went from, “What you're doing is a hilarious misstep and will never work,” to, “Okay, and do you have room for another investor?” And in the interest of full disclosure, the answer to that was yes, and I became one of your angel investors. It's not exactly common for me to do that kind of a hard pivot. And I kind of suspect I'm not the only person who currently holds the opinion that I used to hold, so let's talk a little bit about that. At the very beginning, what is LocalStack and what does it you would say that you folks do?Waldemar: So LocalStack, in a nutshell, is a cloud emulator that runs on your local machine. It's basically like a sandbox environment where you can develop your applications locally. We have currently a range of around 60, 70 services that we provide, things like Lambda Functions, DynamoDB, SQS, like, all the major AWS services. And to your point, it is indeed a pretty large undertaking to actually implement the cloud and run it locally, but with the right approach, it actually turns out that it is feasible and possible, and we've demonstrated this with LocalStack. And I'm glad that we've convinced you to think of it that way as well.Corey: A couple of points that you made during that early conversation really stuck with me. The first is, “Yeah, AWS has two, no three no four-hundred different service offerings. But look at your customer base. How many of those services are customers using in any real depth? And of those services, yeah, the APIs are vast, and very much a sprawling pile of nonsense, but how many of those esoteric features are those folks actually using?” That was half of the argument that won me over.The other half was, “Imagine that you're an enormous company that's an insurance company or a bank. And this year, you're hiring 5000 brand new developers, fresh out of school. Two to 3000 of those developers will still be working here in about a year as they wind up either progressing in other directions, not winding up completing internships, or going back to school after internships, or for a variety of reasons. So, you have that many people that you need to teach how to use cloud in the context that we use cloud, combined with the question of how do you make sure that one of them doesn't make a fun mistake that winds up bankrupting the entire company with a surprise AWS bill?” And those two things combined turned me from, “What you're doing is ridiculous,” to, “Oh, my God. You're absolutely right.”And since then, I've encountered you in a number of my client environments. You were absolutely right. This is something that resonates deeply and profoundly with larger enterprise customers in particular, but also folks who just don't want to wind up being beholden to every time they do a deploy to anything to test something out, yay, I get to spend more money on AWS services.Waldemar: Yeah, totally. That's spot on. So, to your first point, so definitely we have a core set of services that most people are using. So, things like Lambda, DynamoDB, SQS, like, the core serverless, kind of, APIs. And then there's kind of a long tail of more exotic services that we support these days, things like, even like QLDB, the quantum ledger database, or, you know, managed streaming for Kafka.But like, certainly, like, the core 15, 20 services are the ones that are really most used by the majority of people. And then we also, you know, pro offering have some very, sort of, advanced services for different use cases. So, that's to your first point.And second point is, yeah, totally spot on. So LocalStack, like, really enables you to experiment in the sandbox. So, we both see it as an experimentation, also development environment, where you don't need to think about cloud costs. And this, I guess, will be very close to your heart in the work that you're doing, the costs are becoming really predictable as well, right? Because in the cloud, you know, work to different companies before doing LocalStack where we were using AWS resources, and you can end up in a situation where overnight, you accumulate, you know, hundreds of thousands of dollars of AWS bill because you've turned on a certain feature, or some, you know, connectivity into some VPC or networking configuration that just turns out to be costly.Also, one more thing that is worth mentioning, like, we want to encourage, like, frequent testing, and a lot of the cloud's billing and cost structure is focused around, for example, hourly billing of resources, right? And if you have a test that just spins up resources that run for a couple of minutes, you still end up paying the entire hour. And we LocalStack, really, that brings down the cloud builds significantly because you can really test frequently, the cycles become much faster, and it's also again, more efficient, more cost-effective.Corey: There's something useful to be said for, “Well, how do I make sure that I turn off resources when I'm done?” In cloud, it's a bit of a game of guess-and-check. And you turn off things you think are there and you wait a few days and you check the bill again, and you go and turn more things off, and the cycle repeats. Or alternately, wait for the end of the month and wonder in perpetuity why you're being billed 48 cents a month, and not be clear on why. Restarting the laptop is a lot more straightforward.I also want to call out some of my own bias on this where I used to be a big believer in being able to build and deploy and iterate on things locally because well, what happens when I'm in a plane with terrible WiFi? Well, in the before times, I flew an awful lot and was writing a fair bit of, well, cloudy nonsense and I still never found that to be a particular blocker on most of what I was doing. So, it always felt a little bit precious to me when people were talking about, well, what if I can't access the internet to wind up building and deploying these things? It's now 2023. How often does that really happen? But is that a use case that you see a lot of?Waldemar: It's definitely a fair point. And probably, like, 95% of cloud development these days is done in a high internet bandwidth environment, maybe some corporate network where you have really fast internet access. But that's only a subset, I guess, of the world out there, right? So, there might be situations where, you know, you may have bad connectivity. Also, maybe you live in a region—or maybe you're traveling even, right? So, there's a lot more and more people who are just, “Digital nomads,” quote-unquote, right, who just like to work in remote places.Corey: You're absolutely right. My bias is that I live in San Francisco. I have symmetric gigabit internet at home. There's not a lot of scenarios in my day-to-day life—except when I'm, you know, on the train or the bus traveling through the city—because thank you, Verizon—where I have impeded connectivity.Waldemar: Right. Yeah, totally. And I think the other aspect of this is kind of the developers just like to have things locally, right, because it gives them the feeling of you know, better control over the code, like, being able to integrate into their IDEs, setting breakpoints, having these quick cycles of iterations. And again, this is something that there's more and more tooling coming up in the cloud ecosystem, but it's still inherently a remote execution that just, you know, takes the round trip of uploading your code, deploying, and so on, and that's just basically the pain point that we're addressing with LocalStack.Corey: One thing that did surprise me as well was discovering that there was a lot more appetite for this sort of thing in enterprise-scale environments. I mean, some of the reference customers that you have on your website include divisions of the UK Government and 3M—you know, the Post-It note people—as well as a number of other very large environments. And at first, that didn't make a whole lot of sense to me, but then it suddenly made an awful lot of sense because it seems—and please correct me if I'm wrong—that in order to use something like this at scale and use it in a way that isn't, more or less getting it into a point where the administration of it is more trouble than it's worth, you need to progress past a certain point of scale. An individual developer on their side project is likely just going to iterate against AWS itself, whereas a team of thousands of developers might not want to be doing that because they almost certainly have their own workflows that make that process high friction.Waldemar: Yeah, totally. So, what we see a lot is, especially in larger enterprises, dedicated teams, like, developer experience teams, whose main job is to really set up a workflow and environment where developers can be productive, most productive, and this can be, you know, on one side, like, setting up automated pipelines, provisioning maybe AWS sandbox and test accounts. And like some of these teams, when we introduce LocalStack, it's really a game-changer because it becomes much more decoupled and like, you know, distributed. You can basically configure your CI pipeline, just, you know, spin up the container, run your tests, tear down again afterwards. So, you know, it's less dependencies.And also, one aspect to consider is the aspect of cloud approvals. A lot of companies that we work with have, you know, very stringent processes around, even getting access to the clouds. Some SRE team needs to enable their IAM permissions and so on. With LocalStack, you can just get started from day one and just get productive and start testing from the local machine. So, I think those are patterns that we see a lot, in especially larger enterprise environments as well, where, you know, there might be some regulatory barriers and just, you know, process-wise steps as well.Corey: When I started playing with LocalStack myself, one of the things that I found disturbingly irritating is, there's a lot that AWS gets largely right with its AWS command-line utility. You can stuff a whole bunch of different options into the config for different profiles, and all the other tools that I use mostly wind up respecting that config. The few that extend it add custom lines to it, but everything else is mostly well-behaved and ignores the things it doesn't understand. But there is no facility that lets you say, “For this particular profile, use this endpoint for AWS service calls instead of the normal ones in public regions.” In fact, to do that, you effectively have to pass specific endpoint URLs to arguments, and I believe the syntax on that is not globally consistent between different services.It just feels like a living nightmare. At first, I was annoyed that you folks wound up having to ship your own command-line utility to wind up interfacing with this. Like, why don't you just add a profile? And then I tried it myself and, oh, I'm not the only person who knows how this stuff works that has ever looked at this and had that idea. No, it's because AWS is just unfortunate in that respect.Waldemar: That is a very good point. And you're touching upon one of the major pain points that we have, frankly, with the ecosystem. So, there are some pull requests against the AWS open-source repositories for the SDKs and various other tools, where folks—not only LocalStack, but other folks in the community have asked for introducing, for example, an AWS endpoint URL environment variable. These [protocols 00:12:32], unfortunately, were never merged. So, it would definitely make our lives a whole lot easier, but so far, we basically have to maintain these, you know, these wrapper scripts, basically, AWS local, CDK local, which basically just, you know, points the client to local endpoints. It's a good workaround for now, but I would assume and hope that the world's going to change in the upcoming years.Corey: I really hope so because everything else I can think of is just bad. The idea of building a custom wrapper around the AWS command-line utility that winds up checking the profile section, and oh, if this profile is that one, call out to this tool, otherwise it just becomes a pass-through. That has security implications that aren't necessarily terrific, you know, in large enterprise companies that care a lot about security. Yeah, pretend to be a binary you're not is usually the kind of thing that makes people sad when security politely kicks their door in.Waldemar: Yeah, we actually have pretty, like, big hopes for the v3 wave of the SDKs, AWS, because there is some restructuring happening with the endpoint resolution. And also, you can, in your profile, by now have, you know, special resolvers for endpoints. But still the case of just pointing all the SDKs and CLI to a custom endpoint is just not yet resolved. And this is, frankly, quite disappointing, actually.Corey: While we're complaining about the CLI, I'll throw one of my recurring issues with it in. I would love for it to adopt the Linux slash Unix paradigm of having a config.d directory that you can reference from within the primary config file, and then any file within that directory in the proper syntax winds up getting adopted into what becomes a giant composable config file, generated dynamically. The reason being is, I can have entire lists of profiles in separate files that I could then wind up dropping in and out on a client-by-client basis. So, I don't inadvertently expose who some of my clients are, in the event that winds up being part of the way that they have named their AWS accounts.That is one of those things I would love but it feels like it's not a common enough use case for there to be a whole lot of traction around it. And I guess some people would make a fair point if they were to say that the AWS CLI is the most widely deployed AWS open-source project, even though all it does is give money to AWS more efficiently.Waldemar: Yeah. Great point. Yeah, I think, like, how and some way to customize and, like, mingle or mangle your configurations in a more easy fashion would be super useful. I guess it might be a slippery slope to getting, you know, into something like I don't know, Helm for EKS and, like, really, you know, having to maintain a whole templating language for these configs. But certainly agree with you, to just you know, at least having [plug 00:15:18] points for being able to customize the behavior of the SDKs and CLIs would be extremely helpful and valuable.Corey: This is not—unfortunately—my first outing with the idea of trying to have AWS APIs done locally. In fact, almost a decade ago now, I did a build-out at a very large company of a… well, I would say that the build-out was not itself very large—it was about 300 nodes—that were all running Eucalyptus, which before it died on the vine, was imagined as a way of just emulating AWS APIs locally—done in Java, as I recall—and exposing local resources in ways that comported with how AWS did things. So, the idea being that you could write configuration to deploy any infrastructure you wanted in AWS, but also treat your local data center the same way. That idea unfortunately did not survive in the marketplace, which is kind of a shame, on some level. What was it that inspired you folks to wind up building this with an eye towards local development rather than run this as a private cloud in your data center instead?Waldemar: Yeah, very interesting. And I do also have some experience [unintelligible 00:16:29] from my past university days with Eucalyptus and OpenStack also, you know, running some workloads in an on-prem cluster. I think the main difference, first of all, these systems were extremely hard, notoriously hard to set up and maintain, right? So, lots of moving parts: you had your image server, your compute system, and then your messaging subsystems. Lots of moving parts, and wanting to have everything basically much more monolithic and in a single container.And Docker really sort of provides a great platform for us, which is create everything in a single container, spin up locally, make it very lightweight and easy to use. But I think really the first days of LocalStack, the idea was really, was actually with the use case of somebody from our team. Back then, I was working at Atlassian in the data engineering team and we had folks in the team were commuting to work on the train. And it was literally this use case that you mentioned before about being able to work basically offline on your commute. And this is kind of were the first lines of code were written and then kind of the idea evolves from there.We put it into the open-source, and then, kind of, it was growing over the years. But it really started as not having it as an on-prem, like, heavyweight server, but really as a lightweight system that you can easily—that is easily portable across different systems as well.Corey: That is a good question. Very often, when I'm using various tools that are aimed at development use cases, it is very clear that one particular operating system is invariably going to be the first-class citizen and everything else is a best effort. Ehh, it might work; it might not. Does LocalStack feel that way? And if so, what's the operating system that you want to be on?Waldemar: I would say we definitely work best on Mac OS and Linux. It also works really well on Windows, but I think given that some of our tooling in the ecosystem also pretty much geared towards Unix systems, I think those are the platforms it will work well with. Again, on the other hand, Docker is really a platform that helps us a lot being compatible across operating systems and also CPU architectures. We have a multi-arch build now for AMD and ARM64. So, I think in that sense, we're pretty broad in terms of the compatibility spectrum.Corey: I do not have any insight into how the experience goes on Windows, given that I don't use that operating system in anger for, wow, 15 years now, but I will say that it's been top-flight on Mac OS, which is what I spend most of my time. Depressed that I'm using, but for desktop experiences, it seems to work out fairly well. That said, having a focus on Windows seems like it would absolutely be a hard requirement, given that so many developer workstations in very large enterprises tend to skew very Windows-heavy. My hat is off to people who work with Linux and Linux-like systems in environments like that where even line endings becomes psychotically challenging. I don't envy them their problems. And I have nothing but respect for people who can power through it. I never had the patience.Waldemar: Yeah. Same here and definitely, I think everybody has their favorite operating system. For me, it's also been mostly Linux and Mac in the last couple of years. But certainly, we definitely want to be broad in terms of the adoption, and working with large enterprises often you have—you know, we want to fit into the existing landscape and environment that people work in. And we solve this by platform abstractions like Docker, for example, as I mentioned, and also, for example, Python, which is some more toolings within Python is also pretty nicely supported across platforms. But I do feel the same way as you, like, having been working with Windows for quite some time, especially for development purposes.Corey: What have you noticed that your customer usage patterns slash requests has been saying about AWS service adoption? I have to imagine that everyone cares whether you can mock S3 effectively. EC2, DynamoDB, probably. SQS, of course. But beyond the very small baseline level of offering, what have you seen surprising demand for, as I guess, customer implementation of more esoteric services continues to climb?Waldemar: Mm-hm. Yeah, so these days it's actually pretty [laugh] pretty insane the level of coverage we already have for different services, including some very exotic ones, like QLDB as I mentioned, Kafka. We even have Managed Airflow, for example. I mean, a lot of these services are essentially mostly, like, wrappers around the API. This is essentially also what AWS is doing, right? So, they're providing an API that basically provisions some underlying resources, some infrastructure.Some of the more interesting parts, I guess, we've seen is the data or big data ecosystem. So, things like Athena, Glue, we've invested quite a lot of time in, you know, making that available also in LocalStack so you can have your maybe CSV files or JSON files in an S3 bucket and you can query them from Athena with a SQL language, basically, right? And that makes it very—especially these big data-heavy jobs that are very heavyweight on AWS, you can iterate very quickly in LocalStack. So, this is where we're seeing a lot of adoption recently. And then also, obviously, things like, you know, Lambda and ECS, like, all the serverless and containerized applications, but I guess those are the more mainstream ones.Corey: I imagine you probably get your fair share of requests for things like CloudFormation or CloudFront, where, this is great, but can you go ahead and add a very lengthy sleep right here, just because it returns way too fast and we don't want people to get their hopes up when they use the real thing. On some level, it feels like exact replication of the AWS customer experience isn't quite in line with what makes sense from a developer productivity point of view.Waldemar: Yeah, that's a great point. And I'm sure that, like, a lot of code out there is probably littered with sleep statements that is just tailored to the specific timing in AWS. In fact, we recently opened an issue in the AWS Terraform provider repository to add a configuration option to configure the timings that Terraform is using for the resource deployment. So, just as an example, an S3 bucket creation takes 60 seconds, like, more than a minute against [unintelligible 00:22:37] AWS. I guess LocalStack, it's a second basically, right?And AWS Terraform provider has these, like, relatively slow cycles of checking whether the packet has already been created. And we want to get that configurable to actually reduce the time it takes for local development, right? So, we have an open, sort of, feature request, and we're probably going to contribute to a Terraform repository. But definitely, I share the sentiment that a lot of the tooling ecosystem is built and tailored and optimized towards the experience against the cloud, which often is just slow and, you know, that's what it is, right?Corey: One thing that I didn't expect, though, in hindsight, is blindingly obvious, is your support for a variety of different frameworks and deployment methodologies. I've found that it's relatively straightforward to get up and running with the CDK deploying to LocalStack, for instance. And in hindsight, of course; that's obvious. When you start out down that path, though it's well, you tend to think—at least I don't tend to think in that particular way. It's, “Well, yeah, it's just going to be a console-like experience, or I wind up doing CloudFormation or Terraform.” But yeah, that the world is advancing relatively quickly and it's nice to see that you are very comfortably keeping pace with that advancement.Waldemar: Yeah, true. And I guess for us, it's really, like, the level of abstraction is sort of increasing, so you know, once you have a solid foundation, with, you know, CloudFormation implementation, you can leverage a lot of tools that are sitting on top of it, CDK, serverless frameworks. So, CloudFormation is almost becoming, like, the assembly language of the AWS cloud, right, and if you have very solid support for that, a lot of, sort of, tools in the ecosystem will natively be supported on LocalStack. And then, you know, you have things like Terraform, and in the Terraform CDK, you know, some of these derived versions of Terraform which also are very straightforward because you just need to point, you know, the target endpoint to localhost and then the rest of the deployment loop just works out of the box, essentially.So, I guess for us, it's really mostly being able to focus on, like, the core emulation, making sure that we have very high parity with the real services. We spend a lot of time and effort into what we call parity testing and snapshot testing. We make sure that our API responses are identical and really the same as they are in AWS. And this really gives us, you know, a very strong confidence that a lot of tools in the ecosystem are working out-of-the-box against LocalStack as well.Corey: I would also like to point out that I'm also a proud LocalStack contributor at this point because at the start of this year, I noticed, ah, in one of the pages, the copyright year was still saying 2022 and not 2023. So, a single-character pull request? Oh, yes, I am on the board now because that is how you ingratiate yourself with an open-source project.Waldemar: Yeah. Eternal fame to you and kudos for your contribution. But, [laugh] you know, in all seriousness, we do have a quite an active community of contributors. We are an open-source first project; like, we were born in the open-source. We actually—maybe just touching upon this for a second, we use GitHub for our repository, we use a lot of automation around, you know, doing pull requests, and you know, service owners.We also participate in things like the Hacktoberfest, which we participated in last year to really encourage contributions from the community, and also host regular meetups with folks in the community to really make sure that there's an active ecosystem where people can contribute and make contributions like the one that you did with documentation and all that, but also, like, actual features, testing and you know, contributions of different levels. So really, kudos and shout out to the entire community out there.Corey: Do you feel that there's an inherent tension between being an open-source product as well as being a commercial product that is available for sale? I find that a lot of companies feel vaguely uncomfortable with the various trade-offs that they make going down that particular path, but I haven't seen anyone in the community upset with you folks, and it certainly hasn't seemed to act as a brake on your enterprise adoption, either.Waldemar: That is a very good point. So, we certainly are—so we're following an open-source-first model that we—you know, the core of the codebase is available in the community version. And then we have pro extensions, which are commercial and you basically, you know, setup—you sign up for a license. We are certainly having a lot of discussions on how to evolve this licensing model going forward, you know, which part to feed back into the community version of LocalStack. And it's certainly an ongoing evolving model as well, but certainly, so far, the support from the community has been great.And we definitely focus to, kind of, get a lot of the innovation that we're doing back into our open-source repo and make sure that it's, like, really not only open-source but also open contribution for folks to contribute their contributions. We also integrate with other third-party libraries. We're built on the shoulders of giants, if I may say so, other open-source projects that are doing great work with emulators. To name just a few, it's like, [unintelligible 00:27:33] which is a great project that we sort of use and depend upon. We have certain mocks and emulations, for Kinesis, for example, Kinesis mock and a bunch of other tools that we've been leveraging over the years, which are really great community efforts out there. And it's great to see such an active community that's really making this vision possible have a truly local emulated clouds that gives the best experience to developers out there.Corey: So, as of, well, now, when people are listening to this and the episode gets released, v2 of LocalStack is coming out. What are the big differences between LocalStack and now LocalStack 2: Electric Boogaloo, or whatever it is you're calling the release?Waldemar: Right. So, we're super excited to release our v2 version of LocalStack. Planned release date is end of March 2023, so hopefully, we will make that timeline. We did release our first version of OpenStack in July 2022, so it's been roughly seven months since then and we try to have a cadence of roughly six to nine months for the major releases. And what you can expect is we've invested a lot of time and effort in last couple of months and in last year to really make it a very rock-solid experience with enhancements in the current services, a lot of performance optimizations, we've invested a lot in parity testing.So, as I mentioned before, parity is really important for us to make sure that we have a high coverage of the different services and how they behave the same way as AWS. And we're also putting out an enhanced version and a completely polished version of our Cloud Pods experience. So, Cloud Pods is a state management mechanism in LocalStack. So, by default, the state in LocalStack is ephemeral, so when you restart the instance, you basically have a fresh state. But with Cloud Pods, we enable our users to take persistent snapshot of the states, save it to disk or to a server and easily share it with team members.And we have very polished experience with Community Cloud Pods that makes it very easy to share the state among team members and with the community. So, those are just some of the highlights of things that we're going to be putting out in the tool. And we're super excited to have it done by, you know, end of March. So, stay tuned for the v2 release.Corey: I am looking forward to seeing how the experience shifts and evolves. I really want to thank you for taking time out of your day to wind up basically humoring me and effectively re-covering ground that you and I covered about a year and a half ago now. If people want to learn more, where should they go?Waldemar: Yeah. So definitely, our Slack channel is a great way to get in touch with the community, also with the LocalStack team, if you have any technical questions. So, you can find it on our website, I think it's slack.localstack.cloud.We also host a Discourse forum. It's discuss.localstack.cloud, where you can just, you know, make feature requests and participate in the general conversation.And we do host monthly community meetups. Those are also available on our website. If you sign up, for example, for a newsletter, you will be notified where we have, you know, these webinars. Take about an hour or so where we often have guest speakers from different companies, people who are using, you know, cloud development, local cloud development, and just sharing the experiences of how the space is evolving. And we're always super happy to accept contributions from the community in these meetups as well. And last but not least, our GitHub repository is a great way to file any issues you may have, feature requests, and just getting involved with the project itself.Corey: And we will, of course, put links to that in the [show notes 00:31:09]. Thank you so much for taking the time to speak with me today. I appreciate it.Waldemar: Thank you so much, Corey. It's been a pleasure. Thanks for having me.Corey: Waldemar Hummer, CTO and co-founder at LocalStack. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment, presumably because your compensation structure requires people to spend ever-increasing amounts of money on AWS services.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
I just love talking to my SkinQueens!!Sera is such a testament to herself and the SkinQueen Society program!!For more info on SQS hit the link in my bio!MUCH LOVE!! xxxPs. Don't forget to take advantage of my 50% off code for Queen of KPI! Click the link below to join now! https://skinqueen.mykajabi.com/offers/Kd8vwPvJ?coupon_code=SKINLOVERLots of Love, SkinQueen xo
In this episode of the AWS Bites Podcast, we dive into the serverless pattern of using AWS Lambda together with SQS. We explain the basics of both Lambda and SQS for those who may not be familiar with them. We talk about how we use Lambda, a Function as a Service offering in AWS, to write our own functions and have AWS run them in response to certain events. And we also discuss SQS, a scalable and managed queuing system available on AWS, which we use to offload work to background workers. We delve into how the two services work together through the use of "Event Source Mapping" in Lambda, which polls our SQS queue and makes synchronous Lambda invocation requests when messages are available. We also mention how this feature provides us with the ability to control batch size and window, as well as specify filters to save execution time and cost. But we also share one of the limitations we faced when using SQS and Lambda together which was the lack of control over concurrency and the potential for excessive throttling. But recently, AWS has released a new feature called "SQS maximum concurrency support" which allows us to specify a maximum number of invocations for an Event Source Mapping. This solves the problem of excessive throttling and eliminates the need to use reserved concurrency. It also allows for more control over concurrency when using multiple Event Source Mappings with the same function. We explain how this new feature has improved our workflow and made it much more efficient.
Descubra a maneira mais simples e efetiva de enviar emails transacionais e de marketing a partir da sua API. O SES (Simple Email Service) da AWS permite integrar sua aplicação à API de emails da Amazon com alta performance, feedback de entregas, validações DKIM, SPF e DMARK de forma fácil e barata usando SMTP ou a própria API do SES além de integrar com diversos serviços da AWS como SQS e SNS para automatizar seus processos de envio de emails.O curso AWS 2.0 está sendo preparado com muito cuidado e dedicação para atender às principais demandas de mercado para profissionais e empreendedores de tecnologia.Inscreva-se agora para aproveitar todas as vantagens do pré-lançamento:https://www.uminventorqualquer.com.br/curso-aws/Inscreva-se no Canal Wesley Milan para acompanhar os Reviews de serviços AWS:https://bit.ly/3LqiYwgInscreva-se no Canal Wesley Milan em Inglês e recomende a seus amigos gringos:https://bit.ly/3LqFjtAMe siga no Instagram: https://bit.ly/3tfzAj0LinkedIn: https://www.linkedin.com/in/wesleymilan/Podcast: https://bit.ly/3qa5JH1
Entenda como essa ferramenta sensacional funciona e como ela pode aumentar a escalabilidade da sua aplicação utilizando o fluxo de mensagens para atribuir tarefas assíncronas a workers.Use o SQS para filas padrão ou filas FIFO, mantendo a ordem de entrada e saída e até agrupando mensagens com deduplicação dentro da fila.O AWS SQS suporta criptografia do lado do servidor mantendo seu conteúdo seguro e permitindo processamento de dados importantes como pagamentos e análise de crédito.O curso AWS 2.0 está sendo preparado com muito cuidado e dedicação para atender às principais demandas de mercado para profissionais e empreendedores de tecnologia.Inscreva-se agora para aproveitar todas as vantagens do pré-lançamento:https://www.uminventorqualquer.com.br/curso-aws/Inscreva-se no Canal Wesley Milan para acompanhar os Reviews de serviços AWS:https://bit.ly/3LqiYwgInscreva-se no Canal Wesley Milan em Inglês e recomende a seus amigos gringos:https://bit.ly/3LqFjtAMe siga no Instagram: https://bit.ly/3tfzAj0LinkedIn: https://www.linkedin.com/in/wesleymilan/Podcast: https://bit.ly/3qa5JH1
About JackJack is Uptycs' outspoken technology evangelist. Jack is a lifelong information security executive with over 25 years of professional experience. He started his career managing security and operations at the world's first Internet data privacy company. He has since led unified Security and DevOps organizations as Global CSO for large conglomerates. This role involved individually servicing dozens of industry-diverse, mid-market portfolio companies.Jack's breadth of experience has given him a unique insight into leadership and mentorship. Most importantly, it fostered professional creativity, which he believes is direly needed in the security industry. Jack focuses his extra time mentoring, advising, and investing. He is an active leader in the ISLF, a partner in the SVCI, and an outspoken privacy activist. Links Referenced: UptycsSecretMenu.com: https://www.uptycssecretmenu.com Jack's email: jroehrig@uptycs.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: If you asked me to rank which cloud provider has the best developer experience, I'd be hard-pressed to choose a platform that isn't Google Cloud. Their developer experience is unparalleled and, in the early stages of building something great, that translates directly into velocity. Try it yourself with the Google for Startups Cloud Program over at cloud.google.com/startup. It'll give you up to $100k a year for each of the first two years in Google Cloud credits for companies that range from bootstrapped all the way on up to Series A. Go build something, and then tell me about it. My thanks to Google Cloud for sponsoring this ridiculous podcast.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by our friends at Uptycs. And they have sent me their Technology Evangelist, Jack Charles Roehrig. Jack, thanks for joining me.Jack: Absolutely. Happy to spread the good news.Corey: So, I have to start. When you call yourself a technology evangelist, I feel—just based upon my own position in this ecosystem—the need to ask, I guess, the obvious question of, do you actually work there, or have you done what I do with AWS and basically inflicted yourself upon a company. Like, well, “I speak for you now.” The running gag that becomes more true every year is that I'm AWS's chief marketing officer.Jack: So, that is a great question. I take it seriously. When I say technology evangelist, you're speaking to Jack Roehrig. I'm a weird guy. So, I quit my job as CISO. I left a CISO career. For, like, ten years, I was a CISO. Before that, 17 years doing stuff. Started my own thing, secondaries, investments, whatever.Elias Terman, he hits me up and he says, “Hey, do you want this job?” It was an executive job, and I said, “I'm not working for anybody.” And he says, “What about a technology evangelist?” And I was like, “That's weird.” “Check out the software.”So, I'm going to check out the software. I went online, I looked at it. I had been very passionate about the space, and I was like, “How does this company exist in doing this?” So, I called him right back up, and I said, “I think I am.” He said, “You think you are?” I said, “Yeah, I think I'm your evangelist. Like, I think I have to do this.” I mean, it really was like that.Corey: Yeah. It's like, “Well, we have an interview process and the rest.” You're like, “Yeah, I have a goldfish. Now that we're done talking about stuff that doesn't matter, I'll start Monday.” Yeah, I like the approach.Jack: Yeah. It was more like I had found my calling. It was bizarre. I negotiated a contract with him that said, “Look, I can't just work for Uptycs and be your evangelist. That doesn't make any sense.” So, I advise companies, I'm part of the SVCI, I do secondaries, investment, I mentor, I'm a steering committee member of the ISLF. We mentor security leaders.And I said, “I'm going to continue doing all of these things because you don't want an evangelist who's just an Uptycs evangelist.” I have to know the space. I have to have my ear to the ground. And I said, “And here's the other thing, Elias. I will only be your evangelist while I'm your evangelist. I can't be your evangelist when I lose passion. I don't think I'm going to.”Corey: The way I see it, authenticity matters in this space. You can sell out exactly once, so make it count because you're never going to be trusted again to do it a second time. It keeps people honest, at least the ones you actually want to be doing work with. So, you've been in the space a long time, 20 years give or take, and you've seen an awful lot. So, I'm curious, given that I tend to see about, you know, six or seven different companies in the RSA Sponsor Hall every year selling things because you know, sure hundreds of booths, bunch of different marketing logos and products, but it all distills down to the same five or six things.What did you see about Uptycs that made you say, “This is different?” Because to be very direct, looking at the website, it's, “Oh, what do you sell?” “Acronyms. A whole bunch of acronyms that, because I don't eat, sleep, and breathe security for a living, I don't know what most of them mean, but I'm sure they're very impressive and important.” What does it actually do, for those of us who are practitioners, but not swimming in the security vendor stream?Jack: So, I've been obsessed with this space and I've seen the acronyms change over and over and over again. I'm always the first one to say, “What does that mean?” As the senior guy in the room a lot of time. So, acronyms. What does Uptycs do? What drew me into them? They did HIDS, Host Intrusion Detection System. I don't know if you remember that. Turned into—Corey: Oh, yeah. OSSEC was the one I always wound up using, the open-source version. OSSEC [kids 00:04:10]. It's like, oh, instead of paying a vendor, you can contribute it yourself because your time is free, right? Free as in puppy, or these days free as in tier when it comes to cloud.Jack: Oh, I like that. So, yeah, I became obsessed with this HIDS stuff. I think it was evident I was doing it, that it was threat [unintelligible 00:04:27]. And these companies, great companies. I started this new job in an education technology company and I needed a lot of work, so I started to play around with more sophisticated HIDS systems, and I fell in love with it. I absolutely fell in love with it.But there are all these limitations. I couldn't find this company that would build it right. And Uptycs has this reputation as being not very sexy, you know? People telling me, “Uptycs? You're going to Uptycs?” Yeah—I'm like, “Yeah. They're doing really cool stuff.”So, Uptycs has, like, this brand name and I had referred Uptycs before without even knowing what it was. So, here I am, like, one of the biggest XDR, I hope to say, activists in the industry, and I didn't know about Uptycs. I felt humiliated. When I heard about what they were doing, I felt like I wasted my career.Corey: Well, that's a strong statement. Let's begin with XDR. To my understanding, that some form of audio cable standard that I use to plug into my microphone. Some would say it, “X-L-R.” I would say sounds like the same thing. What is XDR?Jack: What is it, right? So, [audio break 00:05:27] implement it, but you install an agent, typically on a system, and that agent collects data on the system: what processes are running, right? Well, maybe it's system calls, maybe it's [unintelligible 00:05:37] as regular system calls. Some of them use the extended Berkeley Packet Filter daemon to get stuff, but one of the problems is that we are obtaining low-level data on an operating system, it's got to be highly specific. So, you collect all this data, who's logging in, which passwords are changing, all the stuff that a hacker would do as you're typing on the computer. You're maybe monitoring vulnerabilities, it's a ton of data that you're monitoring.Well, one of the problems that these companies face is they try to monitor too much. Then some came around and they tried to monitor too little, so they weren't as real-time.Corey: Sounds like a little pig story here.Jack: Yeah [laugh], exactly. Another company came along with a fantastic team, but you know, I think they came in a little late in the game, and it looks like they're folding now. They were wonderful company, but the one of the biggest problems I saw was the agent, the compatibility. You know, it was difficult to deploy. I ran DevOps and security and my DevOps team uninstalled the agent because they thought there was a problem with it, we proved there wasn't and four months later, they hadn't completely reinstall it.So, a CISO who manages the DevOps org couldn't get his own DevOps guy to install this agent. For good reason, right? So, this is kind of where I'm going with all of this XDR stuff. What is XDR? It's an agent on a machine that produces a ton of data.I—it's like omniscience. Yes, I started to turn it in, I would ping developers, I was like, “Why did you just run sudo on that machine?” Right. I mean, I knew everything was going on in the space, I had a good intro to all the assets, they technically run on the on-premise data center and the quote-unquote, “Cloud.” I like to just say the production estate. But it's omniscience. It's insights, you can create rules, it's one of the most powerful security tools that exists.Corey: I think there's a definite gap as far as—let's narrow this down to cloud for just a second before we expand this into the joy that has data centers—where you can instrument a whole bunch of different security services in any cloud provider—I'm going to pick on AWS because they're the 800-pound gorilla in the room, and frankly, they could use taking down a peg or two by and large—and you wind up configuring all the different security services that in some cases seem totally unaware of each other, but that's the AWS product portfolio for you. And you do the math out and realize that it theoretically would cost you—to enable all these things—about three times as much as the actual data breach you're ideally trying to prevent against. So, on some level, it feels like, “Heads, I win; tails, you lose,” style scenario.And the answer that people have started reaching out to third-party vendors to wind up tying all of this together into some form of cohesive narrative that a human being has a hope in hell of understanding. But everything I've tried to this point still feels like it is relatively siloed, focused on the whole fear, uncertainty, and doubt that is so inherent to so much of the security world's marketing. And it's almost like cost control where you can spend almost limitless amount of time, energy, money, et cetera, trying to fix these things, but it doesn't advance your company to the next milestone. It's like buying fire insurance on your building. You can spend all the money on fire insurance. Great, it doesn't get you to the next milestone that propels your company forward. It's all reactive instead of proactive. So, it feels like it is never the exciting, number-one priority for companies until right after it should have been higher in the list than it was.Jack: So, when I worked at Turnitin, we had saturated the market. And we worked in education, technology space globally. Compliance everywhere. So, I just worked on the Australian Data Infrastructure Act of 2020. I'm very familiar with the 27 data privacy regulations that are [laugh] in scope for schools. I'm a FERPA expert, right? I know that there's only one P in HIPAA [laugh].So, all of these compliance regulations drove schools and universities, consortiums, government agencies to say, “You need to be secure.” So, security at Turnitin was the number one—number one—key performance indicator of the company for one-and-a-half years. And these cloud security initiatives didn't just make things more secure. They also allowed me to implement a reasonable control framework to get various compliance certifications. So, I'm directly driving sales by deploying these security tools.And the reason why that worked out so great is, by getting the certifications and by building a sensible control framework layer, I was taking these compliance requirements and translating them into real mitigations of business risk. So, the customers are driving security as they should. I'm implementing sane security controls by acting as the chief security officer, company becomes more secure, I save money by using the correct toolset, and we increased our business by, like, 40% in a year. This is a multibillion-dollar company.Corey: That is definitely a story that resonates, especially with organizations that are—or they should be—compliance-forward and having to care about the nature of what it is that they're doing. But I have a somewhat storied history in working in FinTech and large-scale financial services. One of the nice things about that job, which is sort of a weird thing to say there if you don't want to get ejected from the room, has been, “Yeah well, it's only money,” in the final analysis. Because yeah, no one dies if you wind up screwing that up. People's kids don't get exposed.It's just okay, people have to fill out a bunch of forms and you get sued into oblivion and you're not there anymore because the first role of a CISO is to be ablative and get burned away whenever there's a problem. But it still doesn't feel like it does more for a number of clients than, on some level, checking a box that they feel needs to be checked. Not that it shouldn't be, necessarily, but I have a hard time finding people that get passionately excited about security capabilities. Where are they hiding?Jack: So, one of the biggest problems that you're going to face is there are a lot of security people that have moved up in the ranks through technology and not through compliance and technology. These people will implement control frameworks based on audit requirements that are not bespoke to their company. They're doing it wrong. So, we're not ticking boxes; I'm creating boxes that need to be ticked to secure the infrastructure. And at Turnitin, Turnitin was a company that people were forced to use to submit their works in the school.So, imagine that you have to submit a sensitive essay, right? And that sensitive essay goes to this large database. We have the Taiwanese government submitting confidential data there. I had the chief scientist at NASA submitting in pre-publication data there. We've got corporate trade secrets that are popped in there. We have all kinds of FDA pre-approval stuff. This is a plagiarism detection software being used by large companies, governments, and 12-year-old girls, right, who don't want their data leaked.So, if you look at it, like, this is an ethical thing that is required for us to do, our customers drive that, but truly, I think it's ethics that drive it. So, when we implemented a control framework, I didn't do the minimum, I didn't run an [unintelligible 00:12:15] scan that nobody ran. I looked for tools that satisfied many boxes. And one of the things about the telemetry at scale, [unintelligible 00:12:22], XDR, whatever want to call it, right? But the agent-based systems that monitor for all of us this run-state data, is they can take a lot of your technical SOC controls.Furthermore, you can use these tools to improve your processes like incident response, right? You can use them to log things. You can eliminate your SIEM by using this for your DLP. The problem of companies in the past is they wouldn't deploy on the entire infrastructure. So, you'd get one company, it would just be on-prem, or one company that would just run on CentOS.One of the reasons why I really liked this Uptycs company is because they built it on an osquery. Now, if you mention osquery, a lot of people glaze over, myself included before I worked at Uptycs. But apparently what it is, is it's this platform to collect a ton of data on the run state of a machine in real-time, pop it into a normalized SQL database, and it runs on a ton of stuff: Mac OS, Windows, like, tons of version of Linux because it's open-source, so people are porting it to their infrastructure. And that was one of these unique differentiators is, what is the cloud? I mean, AWS is a place where you can rapidly prototype, there's tons of automation, you can go in and you build something quickly and then it scales.But I view the cloud as just a simple abstraction to refer to all of my assets, be them POPS, on-premise data machines, you know, the corporate environment, laptops, desktops, the stuff that we buy in the public clouds, right? These things are all part of the greater cloud. So, when I think cloud security, I want something that does it all. That's very difficult because if you had one tool run on your cloud, one tool to run on your corporate environment, and one tool to run for your production environment, those tools are difficult to manage. And the data needs to be ETL, you know? It needs to be normalized. And that's very difficult to do.Our company is doing [unintelligible 00:14:07] security right now as a company that's taking all these data signals, and they're normalizing them, right, so that you can have one dashboard. That's a big trend in security right now. Because we're buying too many tools. So, I guess the answer that really is, I don't see the cloud is just AWS. I think AWS is not just data—they shouldn't call themselves the cloud. They call themselves the cloud with everything. You can come in, you can rapidly prototype your software, and you know what? You want to run to the largest scale possible? You can do that too. It's just the governance problem that we run into.Corey: Oh, yes. The AWS product strategy is pretty clearly, in a word, “Yes,” written on a Post-it note somewhere. That's the easiest job in the world is running their strategy. The challenge, too, is that we don't live in a world where monocultures are a thing anymore because regardless—if you use AWS for the underlying infrastructure, great, that makes a lot of sense. Use it for a lot of the higher-up the stack, SaaS-y type things that you don't want to have to build yourself from—by going to Home Depot and picking up components, you're doing something relatively foolish in most cases.They're a plumbing company not a porcelain company, in many respects. And regardless of what your intention is around multiple clouds, people wind up using different things. In most cases, you're going to be storing your source code in GitHub, not in AWS CodeCommit because CodeCommit doesn't really have any customers, for reasons that become blindingly apparent the first time you try to use it for something. So, you always wind up with these cross-cloud, cross-infrastructure stories. For any company that had the temerity to be founded before 2010, they probably have an on-premises data center as well—or six or more—and you're starting to try to wind up having a whole bunch of different abstractions viewed through the same lenses in terms of either observability or control plane or governance, or—dare I say it—security. And it feels like there are multiple approaches, all of which have their drawbacks, which of course means, it's complicated. What's your take on it?Jack: So, I think it was two years ago we started to see tools to do signal consumption. They would aggregate those signals and they would try and produce meaningful results that were actionable rather than you having to go and look at all this granular data. And I think that's phenomenal. I think a lot of companies are going to start to do that more and more. One of the other trends people do is they eliminated data and they went machine-learning and anomaly detection. And that didn't work.It missed a lot of things, right, or generated a lot of false positive. I think that one of the next big technologies—and I know it's been done for two years—but I think we're the next things we're going to see is the axonius of the consumption of events, the categorization into alerts-based synthetic data classification policies, and we're going to look at the severity classifications of those, they're going to be actionable in a priority queue, and we're going to eliminate the need for people that don't like their jobs and sit at a SOC all day and analyze a SIEM. I don't ever run a SIEM, but I think that this diversity can be a good thing. So, sometimes it's turned out to be a bad thing, right? We wanted to diversity, we don't want all the data to be homogenous. We don't need data standards because that limits things. But we do want competition. But I would ask you this, Corey, why do you think AWS? We remember 2007, right?Corey: I do. Oh, I've been around at least that long.Jack: Yeah, you remember when S3 came up. Was that 2007?Corey: I want to say 2004, 2005 in beta, and then relaunched as the first general available service. The first beta service was SQS, so there's always some question about which one was first. I don't get in the middle of those fights because all I'm going to do is upset people.Jack: But S3 was awesome. It still is awesome, right?Corey: Oh yes.Jack: And you know what I saw? I worked for a very older company with very strict governance. You know with SOX compliance, which is a joke, but we also had SOC compliance. I did HIPAA compliance for them. Tons of compliance to this.I'm not a compliance off, too, by trade. So, I started seeing [x cards 00:17:54], you know, these company personal cards, and people would go out and [unintelligible 00:17:57] platform because if they worked with my teams internally, if they wanted to get a small app deployed, it was like a two, three-month process. That process was long because of CFO overhead, approvals, vendor data security vetting, racking machines. It wasn't a problem that was inherent to the technology. I actually built a self-service cloud in that company. The problem was governance. It was financial approvals, it was product justification.So, I think AWS is really what made the internet inflect and scale and innovate amazingly. But I think that one of the things that it sacrificed was governance. So, if you tie a lot of what we're saying back together, by using some sort of tool that you can pop into a cloud environment and they can access a hundred percent of the infrastructure and look for risks, what you're doing is you're kind of X-Ray visioning into all these nodes that were deployed rapidly and kept around because they were crown jewels, and you're determining the risks that lie on them. So, let's say that 10 or 15% of your estate is prototype things that grew at a scale and we can't pull back into our governance infrastructure. A lot of times people think that those types of team machines are probably pretty locked down and they're probably low risk.If you throw a company on the side scanner or something like that, you'll see they have 90% of the risk, 80% of the risk. They're unpatched and they're old. So, I remember at one point in my career, right, I'm thinking Amazon's great. I'm—[unintelligible 00:19:20] on Amazon because they've made the internet go, they influxed. I mean, they've scaled us up like crazy.Corey: Oh, the capability store is phenomenal. No argument there.Jack: Yeah. The governance problem, though, you know, the government, there's a lot of hacks because of people using AWS poorly.Corey: And to be clear, that's everyone. We all are. I take a look at some of the horrible technical decisions I made even a couple of years ago, based upon what I know now, it's difficult to back out and wind up doing things the proper way. I wrote an article a while back, “17 Ways to Run Containers on AWS,” and listed all the services. And I think it was a little on the nose, but then I wrote 17, “More Ways to Run Containers on AWS,” but different services. And I'm about three-quarters of the way through the third in the sequel. I just need a couple more releases and we're good to go.Jack: The more and more complexity you add, the more security risk exists. And I've heard horror stories. Dictionary.com lost a lot of business once because a couple of former contractors deleted some instances in AWS. Before that, they had a secret machine they turned into a pixel [unintelligible 00:20:18] and had take down their iPhone app.I've seen some stuff. But one of the interesting things about deploying one of these tools in AWS, they can just, you know, look X-Ray vision on into all your compute, all your storage and say, “You have PIIs stored here, you have personal data stored here, you have this vulnerability, that vulnerability, this machine has already been compromised,” is you can take that to your CEO as a CISO and say, “Look, we were wrong, there's a lot of risk here.” And then what I've done in the past is I've used that to deploy HIDS—XDR, telemetry at scale, whatever you want to call it—these agent-based solutions, I've used that to justification for them. Now, the problem with this solutions that use agentless is almost all of them are just in the cloud. So, just a portion of your infrastructure.So, if your hybrid environment, you have data centers, you're ignoring the data centers. So, it's interesting because I've seen these companies position themselves as competitors when really, they're in complementary spaces, but one of them justified the other for me. So, I mean, what do you think about that awkward competition? Why was this competition exists between these people if they do completely different things?Corey: I'll take it a step further. I'm a big believer that security for the cloud providers should not be a revenue generator in any meaningful sense because at that point, they wind up with an inherent conflict of interest, where when they start charging, especially trying to do value-based pricing as they move up the stack, what they're inherently saying is, great, you can get our version of our services that is less secure, so that they're what they're doing is they're making security on their platform an inherent investment decision. And I've never been a big believer in that approach.Jack: The SSO tax.Corey: Oh, yes. And many others.Jack: Yeah. So, I was one of the first SSO tax contributors. That started it.Corey: You want data plane audit logging? Great, that'll cost you. But they finally gave in a couple of years back and made the first management trail for CloudTrail audit logging free for everyone. And people still advertently built second ones and then wonder why they're paying through the nose. Like, “Oh, that's 40 grand a month. That should be zero.” Great. Send that to your SIEM and then have that pass it out to where it needs to go. But so much of it is just these weird configuration taxes that people aren't fully aware exist.Jack: It's the market, right? The market is—so look at Amazon's IAM. It is amazing, right? It's totally robust, who is using it correctly? I know a lot of people are. I've been the CISO for over 100 companies and IAM is was one of those things that people don't know how to use, and I think the reason is because people aren't paying for it, so AWS can continue to innovate on it.So, we find ourselves with this huge influx of IAM tools in the startup scene. We all know Uptycs does some CIAM and some identity management stuff. But that's a great example of what you're talking about, right? These cloud companies are not making the things inherently secure, but they are giving some optionality. The products don't grow because they're not being consumed.And AWS doesn't tend to advertise them as much as the folks in the security industry. It's been one complaint of mine, right? And I absolutely agree with you. Most of the breaches are coming out of AWS. That's not AWS's fault. AWS's infrastructure isn't getting breached.It's the way that the customers are configuring the infrastructure. That's going to change a lot soon. We're starting to see a lot of change. But the fundamental issue here is that security needs to be invested in for short-term initiatives, not just for long-term initiatives. Customers need to care about security, not compliance. Customers need to see proof of security. A customer should be demanding that they're using a secure company. If you've ever been on the vendor approval side, you'll see it's very hard to push back on an insecure company going through the vendor process.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: Oh, yes. I wound up giving probably about 100 companies now S3 Bucket Negligence Awards for being public about failing to secure their data and put that out into the world. I had one physical bucket made, the S3 Bucket Responsibility Award and presented it to their then director of security over at the Pokémon Company because there was a Wall Street Journal article talking about how their security review—given the fact that they are a gaming company that has children as their primary customer—they take it very seriously. And they cited the reason they're not to do business with one unnamed vendor was in part due to the lackadaisical approach around S3 bucket control. So, that was the one time I've seen in public a reference where, “Yeah, we were going to use a vendor and their security story was terrible, and we decided not to.”It's, why is that news? That should be a much more common story, but these days, it feels like procurement is rubber-stamping it and, like, “Okay, great. Fill out the form.” And, “Okay, you gave some wrong answers on the form. Try it again and tell the story differently until it gets shoved through.” It feels like it's a rubber stamp rather than a meaningful control.Jack: It's not a rubber stamp for me when I worked in it. And I'm a big guy, so they come to me, you know, like—that's how being, like, career law, it's just being big and intimidating. Because that's—I mean security kind of is that way. But, you know, I've got a story for you. This one's a little more bleak.I don't know if there's a company called Ask.fm—and I'll mention them by name—right, because, well, I worked for a company that did, like, a hostile takeover this company. And that's when I started working with [unintelligible 00:25:23]. [unintelligible 00:25:24]. I speak Russian and I learned it for work. I'm not Russian, but I learned the language so that I could do my job.And I was working for a company with a similar name. And we were in board meetings and we were crying, literally shedding tears in the boardroom because this other company was being mistaken for us. And the reason why we were shedding tears is because young women—you know, 11 to 13—were committing suicide because of online bullying. They had no health and safety department, no security department. We were furious.So, the company was hosted in Latvia, and we went over there and we installed one I lived in Latvia for quite a bit, working as the CISO to install a security program along with the health and safety person to install the moderation team. This is what we need to do in the industry, especially when it comes to children, right? Well, regulation solve it? I don't know.But what you're talking about the Pokémon video game, I remember that right? We can't have that kind of data being leaked. These are children. We need to protect them with information security. And in education technology, I'll tell you, it's just not a budget priority.So, the parents need to demand the security, we need to demand these audit certifications, and we need to demand that our audit firms are audited better. Our audit firms need to be explaining to security leaders that the control frameworks are something that they're responsible for creating bespoke. I did a presentation with Al Kingsley recently about security compliance, comparing FERPA and COPPA to the GDPR. And it was very interesting because FERPA has very little teeth, it's very long code and GDPR is relatively brilliant. GDPR made some changes. FERPA was so ambiguous and vague, it made a lot of changes, but they were kind of like, in any direction ever because nobody knows FERPA is. So, I don't know, what's the answer to that? What do we do?Corey: Yeah. The challenge is, you can see a lot of companies in specific areas doing the right thing, when they're intentionally going out on day one to, for example, service kids as a primary user base demographic. The challenge that you see with this is that, that's great, but then you have things that are not starting off with that point of view. And they started running into population limits and realize, okay, we've got to start expanding our user base somewhere, and then they went a bolting on those things is almost as an afterthought, where, “Oh, well, we've been basically misusing people's data for our entire existence, but now—now—we're suddenly magically going to do the right thing where kids are concerned.” I wish, but unfortunate that philosophy assumes a better take of humanity than is readily apparent.Jack: I wonder why they do that though, right? Something's got to, you know, news happened or something and that's why they're doing it. And that's not okay. But I have seen companies, one of the founders of Scantron—do you know what a Scantron is?Corey: Oh, yes. I'm much older than I look.Jack: Yeah, I'm much older than I look, too. I like to think that. But for those that don't know, a scantron, use a number two pencil and you filled in these little dots. And it was for taking tests. So, the guy who started Scantron, created a small two-person company.And AWS did something magnificent. They recognized that it was an education technology company, and they gave them, for free, security consultation services, security implementation services. And when we bought this company—I'm heavily involved in M&A, right—I'm sitting down with the two founders of the company, and my jaw is on the desk. They were more secure than a lot of the companies that I've worked with that had robust security departments. And I said, “How did you do this?”They said, “AWS provided us with this free service because we're education technology.” I teared up. My heart was—you know, that's amazing. So, there are companies that are doing this right, but then again, look at Grammarly. I hate to pick on Grammarly. LanguageTool is an open-source I believe, privacy-centric Grammarly competitor, but Grammarly, invest in your security a little more, man. Y'all were breached. They store a lot of data, they [unintelligible 00:29:10] lot of the data.Corey: Oh, and it scared the living hell out of companies realizing that they had business users using Grammarly as an extension to work on internal documents and just sending proprietary data to some third-party service that they clicked through the terms on and I don't know that it was ever shown the Grammarly was misusing any of that, but the potential for that is massive.Jack: Do you know what they were doing with it?Corey: Well, using AI to learn these things. Yeah, but it's the supervision story always involves humans reading it.Jack: They were building a—and I think—nobody knows the rumor, but I've worked in the industry, right, pretty heavily. They're doing something great for the world. I believe they're building a database of works submitted to do various things with them. One of those things is plagiarism detection. So, in order to do that they got to store, like, all of the data that they're processing.Well, if you have all the data that you've done for your company that's sitting in this Grammarly database and they get hacked—luckily, that's a lot of data. Maybe you'll be overlooked. But I've data breach database sitting here on my desk. Do you know how many rows it's got? [pause]. Yes, breach database.Corey: Oh, I wouldn't even begin to guess. I know the data volumes that Troy Hunt's Have I Been Pwned? Site winds up dealing with and it is… significant.Jack: How many billions of rows do you think it is?Corey: Ah, I'd say 20 as an argument?Jack: 34.Corey: Okay. Yeah, directionally right. Fermi estimation saves us yet again.Jack: [laugh]. The reason I build this breach database is because I thought Covid would slow down and I wanted it to do executive protection. Companies in the education space also suffer from [active 00:30:42] shooters and that sort of thing. So, that's another thing about security, too, is it transcends all these interesting areas, right? Like here, I'm doing executive risk protection by looking at open-source data.Protect the executives, show the executives that security is a concern, these executives that'll realize security's real. Then these past that security down in the list of priorities, and next thing you know, the 50 million active students that are using Turnitin are getting better security. Because an executive realized, “Hey, wait a minute, this is a real thing.” So, there's a lot of ways around this, but I don't know, it's a big space, there's a lot of competition. There's a lot of companies that are coming in and flashing out of the pan.A lot of companies are coming in and building snake oil. How do people know how to determine the right things to use? How do people don't want to implement? How do people understand that when they deploy a program that only applies to their cloud environment it doesn't touch there on-prem where a lot of data might be a risk? And how do we work together? How do we get teams like DevOps, IT, SecOps, to not fight each other for installing an agent for doing this?Now, when I looked at Uptycs, I said, “Well, it does the EDR for corp stuff, it does the host intrusion detection, you know, the agent-based stuff, I think, for the well because it uses a buzzword I don't like to use, osquery. It's got a bunch of cloud security configuration on it, which is pretty commoditized. It does agentless cloud scanning.” And it—really, I spent a lot of my career just struggling to find these tools. I've written some myself.And when I saw Uptycs, I was—I felt stupid. I couldn't believe that I hadn't used this tool, I think maybe they've increased substantially their capabilities, but it was kind of amazing to me that I had spent so much of my time and energy and hadn't found them. Luckily, I decided to joi—actually I didn't decide to join; they kind of decided for me—and they started giving it away for free. But I found that Uptycs needs a, you know, they need a brand refresh. People need to come and take a look and say, “Hey, this isn't the old Uptycs. Take a look.”And maybe I'm wrong, but I'm here as a technology evangelist, and I'll tell you right now, the minute I no longer am evangelists for this technology, the minute I'm no longer passionate about it, I can't do my job. I'm going to go do something else. So, I'm the one guy who will put it to your brass tacks. I want this thing to be the thing I've been passionate about for a long time. I want people to use it.Contact me directly. Tell me what's wrong with it. Tell me I'm wrong. Tell me I'm right. I really just want to wrap my head around this from the industry perspective, and say, “Hey, I think that these guys are willing to make the best thing ever.” And I'm the craziest person in security. Now, Corey, who's the craziest person security?Corey: That is a difficult question with many wrong answers.Jack: No, I'm not talking about McAfee, all right. I'm not that level of crazy. But I'm talking about, I was obsessed with this XDR, CDR, all the acronyms. You know, we call it HIDS, I was obsessed with it for years. I worked for all these companies.I quit doing, you know, a lot of very good entrepreneurial work to come work at this company. So, I really do think that they can fix a lot of this stuff. I've got my fingers crossed, but I'm still staying involved in other things to make these technologies better. And the software's security space is going all over the place. Sometimes it's going bad direction, sometimes it's going to good directions. But I agree with you about Amazon producing tools. I think it's just all market-based. People aren't going to use the complex tools of Amazon when there's all this other flashy stuff being advertised.Corey: It all comes down to marketing budget, and AWS has always struggled with telling a story. I really want to thank you for being so generous with your time. If people want to learn more, where should they go?Jack: Oh, gosh, everywhere. But if you want to learn more about Uptycs, why don't you just email me?Corey: We will, of course, put your email address into the show notes.Jack: Yeah, we'll do it.Corey: Don't offer if you're not serious. There's also uptycssecretmenu.com, which is apparently not much of a secret, given the large banner all over Uptycs' website.Jack: Have you seen this? Let me just tell you about this. This is not a catch. I was blown away by this; it's one of the reasons I joined. For a buck, if you have between 100 and 1000 nodes, right, you get our agentless system and our agent-based system, right?I think it's only on AWS. But that's, like, what, $150, $180,000 value? You get it for a full year. You don't have to sign a contract to renew or anything. Like, you just get it for a buck. If anybody who doesn't go on to the secret menu website and pay $1 and check out this agentless solution that deploys in two minutes, come on, man.I challenge everybody, go on there, do that, and tell me what's wrong with it. Go on there, do that, and give me the feedback. And I promise you I'll do everything in my best efforts to make it the best. I saw the engineering team in this company, they care. Ganesh, the CEO, he is not your average CEO.This guy is in tinkerers. He's on there, hands on keyboard. He responds to me in the middle of night. He's a geek just like me. But we need users to give us feedback. So, you got this dollar menu, you sign up before the 31st, right? You get the product for buck. Deploy the thing in two minutes.Then if you want to do the XDR, this agent-based system, you can deploy that at your leisure across whichever areas you want. Maybe you want a corporate network on laptops and desktops, your production infrastructure, your compute in the cloud, deploy it, take a look at it, tell me what's wrong with it, tell me what's right with it. Let's go in there and look at it together. This is my job. I want this company to work, not because they're Uptycs but because I think that they can do it.And this is my personal passion. So, if people hit me up directly, let's chat. We can build a Slack, Uptycs skunkworks. Let's get this stuff perfect. And we're also going to try and get some advisory boards together, like, maybe a CISO advisory board, and just to get more feedback from folks because I think the Uptycs brand has made a huge shift in a really positive direction.And if you look at the great thing here, they're unifying this whole agentless and agent-based stuff. And a lot of companies are saying that they're competing with that, those two things need to be run together, right? They need to be run together. So, I think the next steps here, check out that dollar menu. It's unbelievable. I can't believe that they're doing it.I think people think it's too good to be true. Y'all got nothing to lose. It's a buck. But if you sign up for it right now, before the December 31st, you can just wait and act on it any month later. So, just if you sign up for it, you're just locked into the pricing. And then you want to hit me up and talk about it. Is it three in the morning? You got me. It's it eight in the morning? You got me.Corey: You're more generous than I am. It's why I work on AWS bills. It's strictly a business-hours problem.Jack: This is not something that they pay me for. This is just part of my personal passion. I have struggled to get this thing built correctly because I truly believe not only is it really cool—and I'm not talking about Uptycs, I mean all the companies that are out there—but I think that this could be the most powerful tool in security that makes the world more secure. Like, in a way that keeps up with the security risks increasing.We just need to get customers, we need to get critics, and if you're somebody who wants to come in and prove me wrong, I need help. I need people to take a look at it for me. So, it's free. And if you're in the San Francisco Bay Area and you give me some good feedback and all that, I'll take you out to dinner, I'll introduce you to startup companies that I think, you know, you might want to advise. I'll help out your career.Corey: So, it truly is dollar menu then.Jack: Well, I'm paying for the dinner out my personal thing.Corey: Exactly. Well, again, you're also paying for the infrastructure required to provide the service, so, you know, one way or another, it's all the best—it's just like Cloud, there is no cloud. It's just someone else's cost center. I like that.Jack: Well, yeah, we're paying for a ton of data hosting. This is a huge loss leader. Uptycs has a lot of money in the bank, I think, so they're able to do this. Uptycs just needs to get a little more bold in their marketing because I think they've spent so much time building an awesome product, it's time that we get people to see it. That's why I did this.My career was going phenomenally. I was traveling the world, traveling the country promoting things, just getting deals left and right and then Elias—my buddy over at Orca; Elias, one of the best marketing guys I've ever met—I've never done marketing before. I love this. It's not just marketing. It's like I get to take feedback from people and make the product better and this is what I've been trying to do.So, you're talking to a crazy person in security. I will go well above and beyond. Sign up for that dollar menu. I'm telling you, it is no commitment, maybe you'll get some spam email or something like that. Email me directly, I'll kill the spam email.You can do it anytime before the end of 2023. But it's only for 2023. So, you got a full year of the services for free. For free, right? And one of them takes two minutes to deploy, so start with that one. Let me know what you think. These guys ideate and they pivot very quickly. I would love to work on this. This is why I came here.So, I haven't had a lot of opportunity to work with the practitioners. I'm there for you. I'll create a Slack, we can all work together. I'll invite you to my Slack if you want to get involved in secondaries investing and startup advisory. I'm a mentor and a leader in this space, so for me to be able to stay active, this is like a quid pro quo with me working for this company.Uptycs is the company that I've chosen now because I think that they're the ones that are doing this. But I'm doing this because I think I found the opportunity to get it done right, and I think it's going to be the one thing in security that when it is perfected, has the biggest impact.Corey: We'll see how it goes out over the coming year, I'm sure. Thank you so much for being so generous with your time. I appreciate it.Jack: I like you. I like you, Corey.Corey: I like me too.Jack: Yeah? All right. Okay. I'm telling [unintelligible 00:39:51] something. You and I are very weird.Corey: It works out.Jack: Yeah.Corey: Jack Charles Roehrig, Technology Evangelist at Uptycs. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting comment that we're going to be able to pull the exact details of where you left it from because your podcast platform of choice clearly just treated security as a box check.Jack: [laugh].Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
AWS reInvent announcements In this episode we are talking about what we would like to see at AWS re:Invent. The AWS internal teams work towards re:Invent to make their big announcements. Sometimes they have announcements beforehand. But the big announcements will be on stage. What would you like to see? Serverless Services An increase to the enhancement and evolution of service development capabilities and the ecosystem in general. So more serverless services coming online for items that aren't serverless and iterating to be more serverless. There's a bunch of things that people are almost afraid of like observability, deployment patterns and some of the frameworks. So make those a bit easier. API Gateways In service development, Private API gateways are interesting, but they could be more developer friendly, in terms of setting up, naming and managing. The edge around the environment needs to be tightened up. I don't think they have nailed the data options yet. We are seeing more enhancements around eventing capability like SNS filtering, SQS and X ray. These are things that make the serverless experience more rich and all encompassing. Developer Enablement That leads on to developer enablement. You need to think about developer enablement and time to value. Can you get a developer up and running with a productive IDE in the cloud? And can they start delivering value rapidly? We're seeing steps around Cloud IDE starting to emerge. I want to see what AWS does around their Cloud9 offering. So that you can go from idea to something deployed and running within minutes. Well Architected We are big advocates of the well architected framework. Over the last year, we've seen good announcements on well architected capabilities, systems and services coming online within AWS. I want to see a continued evolution of that. And more well architected thinking and characteristics baked into everything AWS are doing. All the way from developer advocates to patterns and code samples. Well architected needs to be better known. A lot of people still don't know what well architected means. There is still work to do to get people to understand what well architected is and what it isn't. And have it present in the tools and not a separate thing that you need to go and do. It should just be there in your job. And there's a second thing. Some of the basic primitives are moving up a level to something more like a pattern. Developer advocates are great at this with Serverless Land. It would be great to see some consistency in those higher level primitives. It's about fast feedback and dare I say the value flywheel. Can we get more feedback faster on our well architected status? Can the pipeline's tell us how well architected the chains that we made are. And can our IDE tell us how well architected we are? Stitching that together in a compelling way with fast feedback for the developers would raise the bar on what developers deliver into production. That's effectively platform heuristics. I remember back in the IBM mainframe days, it was a mature platform. And you knew it so well, that they started to bake in heuristics. When you start to analyse your code there would be a heuristics to say this might be wrong. Here's what you need to do to fix it. We're starting to see some of these elements emerge already with Security Hub, Reliability Hub and Fault Injection Simulator. It's about stitching those together like factory mode for workloads in your accounts. Tell me we're not well architected, and tell me how to fix it. There's probably an extension to that. As we enable developers I'm seeing good industry examples from new tools that have been launched. In other words here's a new feature and here's how it applies to your particular industry. Sustainability Sustainability is a topic close to our hearts. I haven't heard much about it this last while. More fine grained analysis of carbon scores would be useful when teams are designing and building their workloads. And getting faster feedback on a particular service and how much it is going to cost you in carbon. What's the wacky stuff you would like to see? I'd love to see work on APA management and API gateways. Factory mode for your accounts. How well architected are you would be really good. Instead of us having to do the reviews and do it manually. The gateways are due an uplift or enhancement. What about documentation? And how you consume information about services. We are seeing an explosion of services on the console. Is there going to be a cut down console just for your layer? Cost is always a big one. FinOps is a massive growth area with a lot of good tools out in the market. The only other one I can think about is Identity. It will be interesting to see if anything comes out with Cognito. Serverless Craic from The Serverless Edge Check out our book The Value Flywheel Effect Follow us on Twitter @ServerlessEdge
About BenBen Whaley is a staff software engineer at Chime. Ben is co-author of the UNIX and Linux System Administration Handbook, the de facto standard text on Linux administration, and is the author of two educational videos: Linux Web Operations and Linux System Administration. He is an AWS Community Hero since 2014. Ben has held Red Hat Certified Engineer (RHCE) and Certified Information Systems Security Professional (CISSP) certifications. He earned a B.S. in Computer Science from Univ. of Colorado, Boulder.Links Referenced: Chime Financial: https://www.chime.com/ alternat.cloud: https://alternat.cloud Twitter: https://twitter.com/iamthewhaley LinkedIn: https://www.linkedin.com/in/benwhaley/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn and this is an episode unlike any other that has yet been released on this august podcast. Let's begin by introducing my first-time guest somehow because apparently an invitation got lost in the mail somewhere. Ben Whaley is a staff software engineer at Chime Financial and has been an AWS Community Hero since Andy Jassy was basically in diapers, to my level of understanding. Ben, welcome to the show.Ben: Corey, so good to be here. Thanks for having me on.Corey: I'm embarrassed that you haven't been on the show before. You're one of those people that slipped through the cracks and somehow I was very bad at following up slash hounding you into finally agreeing to be here. But you certainly waited until you had something auspicious to talk about.Ben: Well, you know, I'm the one that really should be embarrassed here. You did extend the invitation and I guess I just didn't feel like I had something to drop. But I think today we have something that will interest most of the listeners without a doubt.Corey: So, folks who have listened to this podcast before, or read my newsletter, or follow me on Twitter, or have shared an elevator with me, or at any point have passed me on the street, have heard me complain about the Managed NAT Gateway and it's egregious data processing fee of four-and-a-half cents per gigabyte. And I have complained about this for small customers because they're in the free tier; why is this thing charging them 32 bucks a month? And I have complained about this on behalf of large customers who are paying the GDP of the nation of Belize in data processing fees as they wind up shoving very large workloads to and fro, which is I think part of the prerequisite requirements for having a data warehouse. And you are no different than the rest of these people who have those challenges, with the singular exception that you have done something about it, and what you have done is so, in retrospect, blindingly obvious that I am embarrassed the rest of us never thought of it.Ben: It's interesting because when you are doing engineering, it's often the simplest solution that is the best. I've seen this repeatedly. And it's a little surprising that it didn't come up before, but I think it's in some way, just a matter of timing. But what we came up with—and is this the right time to get into it, do you want to just kind of name the solution, here?Corey: Oh, by all means. I'm not going to steal your thunder. Please, tell us what you have wrought.Ben: We're calling it AlterNAT and it's an alternative solution to a high-availability NAT solution. As everybody knows, NAT Gateway is sort of the default choice; it certainly is what AWS pushes everybody towards. But there is, in fact, a legacy solution: NAT instances. These were around long before NAT Gateway made an appearance. And like I said they're considered legacy, but with the help of lots of modern AWS innovations and technologies like Lambdas and auto-scaling groups with max instance lifetimes and the latest generation of networking improved or enhanced instances, it turns out that we can maybe not quite get as effective as a NAT Gateway, but we can save a lot of money and skip those data processing charges entirely by having a NAT instance solution with a failover NAT Gateway, which I think is kind of the key point behind the solution. So, are you interested in diving into the technical details?Corey: That is very much the missing piece right there. You're right. What we used to use was NAT instances. That was the thing that we used because we didn't really have another option. And they had an interface in the public subnet where they lived and an interface hanging out in the private subnet, and they had to be configured to wind up passing traffic to and fro.Well, okay, that's great and all but isn't that kind of brittle and dangerous? I basically have a single instance as a single point of failure and these are the days early on when individual instances did not have the level of availability and durability they do now. Yeah, it's kind of awful, but here you go. I mean, the most galling part of the Managed NAT Gateway service is not that it's expensive; it's that it's expensive, but also incredibly good at what it does. You don't have to think about this whole problem anymore, and as of recently, it also supports ipv4 to ipv6 translation as well.It's not that the service is bad. It's that the service is stonkingly expensive, particularly at scale. And everything that we've seen before is either oh, run your own NAT instances or bend your knee and pays your money. And a number of folks have come up with different options where this is ridiculous. Just go ahead and run your own NAT instances.Yeah, but what happens when I have to take it down for maintenance or replace it? It's like, well, I guess you're not going to the internet today. This has the, in hindsight, obvious solution, well, we just—we run the Managed NAT Gateway because the 32 bucks a year in instance-hour charges don't actually matter at any point of scale when you're doing this, but you wind up using that for day in, day out traffic, and the failover mode is simply you'll use the expensive Managed NAT Gateway until the instance is healthy again and then automatically change the route table back and forth.Ben: Yep. That's exactly it. So, the auto-scaling NAT instance solution has been around for a long time well, before even NAT Gateway was released. You could have NAT instances in an auto-scaling group where the size of the group was one, and if the NAT instance failed, it would just replace itself. But this left a period in which you'd have no internet connectivity during that, you know, when the NAT instance was swapped out.So, the solution here is that when auto-scaling terminates an instance, it fails over the route table to a standby NAT Gateway, rerouting the traffic. So, there's never a point at which there's no internet connectivity, right? The NAT instance is running, processing traffic, gets terminated after a certain period of time, configurable, 14 days, 30 days, whatever makes sense for your security strategy could be never, right? You could choose that you want to have your own maintenance window in which to do it.Corey: And let's face it, this thing is more or less sitting there as a network traffic router, for lack of a better term. There is no need to ever log into the thing and make changes to it until and unless there's a vulnerability that you can exploit via somehow just talking to the TCP stack when nothing's actually listening on the host.Ben: You know, you can run your own AMI that has been pared down to almost nothing, and that instance doesn't do much. It's using just a Linux kernel to sit on two networks and pass traffic back and forth. It has a translation table that kind of keeps track of the state of connections and so you don't need to have any service running. To manage the system, we have SSM so you can use Session Manager to log in, but frankly, you can just disable that. You almost never even need to get a shell. And that is, in fact, an option we have in the solution is to disable SSM entirely.Corey: One of the things I love about this approach is that it is turnkey. You throw this thing in there and it's good to go. And in the event that the instance becomes unhealthy, great, it fails traffic over to the Managed NAT Gateway while it terminates the old node and replaces it with a healthy one and then fails traffic back. Now, I do need to ask, what is the story of network connections during that failover and failback scenario?Ben: Right, that's the primary drawback, I would say, of the solution is that any established TCP connections that are on the NAT instance at the time of a route change will be lost. So, say you have—Corey: TCP now terminates on the floor.Ben: Pretty much. The connections are dropped. If you have an open SSH connection from a host in the private network to a host on the internet and the instance fails over to the NAT Gateway, the NAT Gateway doesn't have the translation table that the NAT instance had. And not to mention, the public IP address also changes because you have an Elastic IP assigned to the NAT instance, a different Elastic IP assigned to the NAT Gateway, and so because that upstream IP is different, the remote host is, like, tracking the wrong IP. So, those connections, they're going to be lost.So, there are some use cases where this may not be suitable. We do have some ideas on how you might mitigate that, for example, with the use of a maintenance window to schedule the replacement, replaced less often so it doesn't have to affect your workflow as much, but frankly, for many use cases, my belief is that it's actually fine. In our use case at Chime, we found that it's completely fine and we didn't actually experience any errors or failures. But there might be some use cases that are more sensitive or less resilient to failure in the first place.Corey: I would also point out that a lot of how software is going to behave is going to be a reflection of the era in which it was moved to cloud. Back in the early days of EC2, you had no real sense of reliability around any individual instance, so everything was written in a very defensive manner. These days, with instances automatically being able to flow among different hardware so we don't get instance interrupt notifications the way we once did on a semi-constant basis, it more or less has become what presents is bulletproof, so a lot of people are writing software that's a bit more brittle. But it's always been a best practice that when a connection fails okay, what happens at failure? Do you just give up and throw your hands in the air and shriek for help or do you attempt to retry a few times, ideally backing off exponentially?In this scenario, those retries will work. So, it's a question of how well have you built your software. Okay, let's say that you made the worst decisions imaginable, and okay, if that connection dies, the entire workload dies. Okay, you have the option to refactor it to be a little bit better behaved, or alternately, you can keep paying the Manage NAT Gateway tax of four-and-a-half cents per gigabyte in perpetuity forever. I'm not going to tell you what decision to make, but I know which one I'm making.Ben: Yeah, exactly. The cost savings potential of it far outweighs the potential maintenance troubles, I guess, that you could encounter. But the fact is, if you're relying on Managed NAT Gateway and paying the price for doing so, it's not as if there's no chance for connection failure. NAT Gateway could also fail. I will admit that I think it's an extremely robust and resilient solution. I've been really impressed with it, especially so after having worked on this project, but it doesn't mean it can't fail.And beyond that, upstream of the NAT Gateway, something could in fact go wrong. Like, internet connections are unreliable, kind of by design. So, if your system is not resilient to connection failures, like, there's a problem to solve there anyway; you're kind of relying on hope. So, it's a kind of a forcing function in some ways to build architectural best practices, in my view.Corey: I can't stress enough that I have zero problem with the capabilities and the stability of the Managed NAT Gateway solution. My complaints about it start and stop entirely with the price. Back when you first showed me the blog post that is releasing at the same time as this podcast—and you can visit that at alternat.cloud—you sent me an early draft of this and what I loved the most was that your math was off because of a not complete understanding of the gloriousness that is just how egregious the NAT Gateway charges are.Your initial analysis said, “All right, if you're throwing half a terabyte out to the internet, this has the potential of cutting the bill by”—I think it was $10,000 or something like that. It's, “Oh no, no. It has the potential to cut the bill by an entire twenty-two-and-a-half thousand dollars.” Because this processing fee does not replace any egress fees whatsoever. It's purely additive. If you forget to have a free S3 Gateway endpoint in a private subnet, every time you put something into or take something out of S3, you're paying four-and-a-half cents per gigabyte on that, despite the fact there's no internet transitory work, it's not crossing availability zones. It is simply a four-and-a-half cent fee to retrieve something that has only cost you—at most—2.3 cents per month to store in the first place. Flip that switch, that becomes completely free.Ben: Yeah. I'm not embarrassed at all to talk about the lack of education I had around this topic. The fact is I'm an engineer primarily and I came across the cost stuff because it kind of seemed like a problem that needed to be solved within my organization. And if you don't mind, I might just linger on this point and kind of think back a few months. I looked at the AWS bill and I saw this egregious ‘EC2 Other' category. It was taking up the majority of our bill. Like, the single biggest line item was EC2 Other. And I was like, “What could this be?”Corey: I want to wind up flagging that just because that bears repeating because I often get people pushing back of, “Well, how bad—it's one Managed NAT Gateway. How much could it possibly cost? $10?” No, it is the majority of your monthly bill. I cannot stress that enough.And that's not because the people who work there are doing anything that they should not be doing or didn't understand all the nuances of this. It's because for the security posture that is required for what you do—you are at Chime Financial, let's be clear here—putting everything in public subnets was not really a possibility for you folks.Ben: Yeah. And not only that but there are plenty of services that have to be on private subnets. For example, AWS Glue services must run in private VPC subnets if you want them to be able to talk to other systems in your VPC; like, they cannot live in public subnet. So essentially, if you want to talk to the internet from those jobs, you're forced into some kind of NAT solution. So, I dug into the EC2 Other category and I started trying to figure out what was going on there.There's no way—natively—to look at what traffic is transiting the NAT Gateway. There's not an interface that shows you what's going on, what's the biggest talkers over that network. Instead, you have to have flow logs enabled and have to parse those flow logs. So, I dug into that.Corey: Well, you're missing a step first because in a lot of environments, people have more than one of these things, so you get to first do the scavenger hunt of, okay, I have a whole bunch of Managed NAT Gateways and first I need to go diving into CloudWatch metrics and figure out which are the heavy talkers. Is usually one or two followed by a whole bunch of small stuff, but not always, so figuring out which VPC you're even talking about is a necessary prerequisite.Ben: Yeah, exactly. The data around it is almost missing entirely. Once you come to the conclusion that it is a particular NAT Gateway—like, that's a set of problems to solve on its own—but first, you have to go to the flow logs, you have to figure out what are the biggest upstream IPs that it's talking to. Once you have the IP, it still isn't apparent what that host is. In our case, we had all sorts of outside parties that we were talking to a lot and it's a matter of sorting by volume and figuring out well, this IP, what is the reverse IP? Who is potentially the host there?I actually had some wrong answers at first. I set up VPC endpoints to S3 and DynamoDB and SQS because those were some top talkers and that was a nice way to gain some security and some resilience and save some money. And then I found, well, Datadog; that's another top talker for us, so I ended up creating a nice private link to Datadog, which they offer for free, by the way, which is more than I can say for some other vendors. But then I found some outside parties, there wasn't a nice private link solution available to us, and yet, it was by far the largest volume. So, that's what kind of started me down this track is analyzing the NAT Gateway myself by looking at VPC flow logs. Like, it's shocking that there isn't a better way to find that traffic.Corey: It's worse than that because VPC flow logs tell you where the traffic is going and in what volumes, sure, on an IP address and port basis, but okay, now you have a Kubernetes cluster that spans two availability zones. Okay, great. What is actually passing through that? So, you have one big application that just seems awfully chatty, you have multiple workloads running on the thing. What's the expensive thing talking back and forth? The only way that you can reliably get the answer to that I found is to talk to people about what those workloads are actually doing, and failing that you're going code spelunking.Ben: Yep. You're exactly right about that. In our case, it ended up being apparent because we have a set of subnets where only one particular project runs. And when I saw the source IP, I could immediately figure that part out. But if it's a K8s cluster in the private subnets, yeah, how are you going to find it out? You're going to have to ask everybody that has workloads running there.Corey: And we're talking about in some cases, millions of dollars a month. Yeah, it starts to feel a little bit predatory as far as how it's priced and the amount of work you have to put in to track this stuff down. I've done this a handful of times myself, and it's always painful unless you discover something pretty early on, like, oh, it's talking to S3 because that's pretty obvious when you see that. It's, yeah, flip switch and this entire engagement just paid for itself a hundred times over. Now, let's see what else we can discover.That is always one of those fun moments because, first, customers are super grateful to learn that, oh, my God, I flipped that switch. And I'm saving a whole bunch of money. Because it starts with gratitude. “Thank you so much. This is great.” And it doesn't take a whole lot of time for that to alchemize into anger of, “Wait. You mean, I've been being ridden like a pony for this long and no one bothered to mention that if I click a button, this whole thing just goes away?”And when you mention this to your AWS account team, like, they're solicitous, but they either have to present as, “I didn't know that existed either,” which is not a good look, or, “Yeah, you caught us,” which is worse. There's no positive story on this. It just feels like a tax on not knowing trivia about AWS. I think that's what really winds me up about it so much.Ben: Yeah, I think you're right on about that as well. My misunderstanding about the NAT pricing was data processing is additive to data transfer. I expected when I replaced NAT Gateway with NAT instance, that I would be substituting data transfer costs for NAT Gateway costs, NAT Gateway data processing costs. But in fact, NAT Gateway incurs both data processing and data transfer. NAT instances only incur data transfer costs. And so, this is a big difference between the two solutions.Not only that, but if you're in the same region, if you're egressing out of your say us-east-1 region and talking to another hosted service also within us-east-1—never leaving the AWS network—you don't actually even incur data transfer costs. So, if you're using a NAT Gateway, you're paying data processing.Corey: To be clear you do, but it is cross-AZ in most cases billed at one penny egressing, and on the other side, that hosted service generally pays one penny ingressing as well. Don't feel bad about that one. That was extraordinarily unclear and the only reason I know the answer to that is that I got tired of getting stonewalled by people that later turned out didn't know the answer, so I ran a series of experiments designed explicitly to find this out.Ben: Right. As opposed to the five cents to nine cents that is data transfer to the internet. Which, add that to data processing on a NAT Gateway and you're paying between thirteen-and-a-half cents to nine-and-a-half cents for every gigabyte egressed. And this is a phenomenal cost. And at any kind of volume, if you're doing terabytes to petabytes, this becomes a significant portion of your bill. And this is why people hate the NAT Gateway so much.Corey: I am going to short-circuit an angry comment I can already see coming on this where people are going to say, “Well, yes. But it's a multi-petabyte scale. Nobody's paying on-demand retail price.” And they're right. Most people who are transmitting that kind of data, have a specific discount rate applied to what they're doing that varies depending upon usage and use case.Sure, great. But I'm more concerned with the people who are sitting around dreaming up ideas for a company where I want to wind up doing some sort of streaming service. I talked to one of those companies very early on in my tenure as a consultant around the billing piece and they wanted me to check their napkin math because they thought that at their numbers when they wound up scaling up, if their projections were right, that they were going to be spending $65,000 a minute, and what did they not understand? And the answer was, well, you didn't understand this other thing, so it's going to be more than that, but no, you're directionally correct. So, that idea that started off on a napkin, of course, they didn't build it on top of AWS; they went elsewhere.And last time I checked, they'd raised well over a quarter-billion dollars in funding. So, that's a business that AWS would love to have on a variety of different levels, but they're never going to even be considered because by the time someone is at scale, they either have built this somewhere else or they went broke trying.Ben: Yep, absolutely. And we might just make the point there that while you can get discounts on data transfer, you really can't—or it's very rare—to get discounts on data processing for the NAT Gateway. So, any kind of savings you can get on data transfer would apply to a NAT instance solution, you know, saving you four-and-a-half cents per gigabyte inbound and outbound over the NAT Gateway equivalent solution. So, you're paying a lot for the benefit of a fully-managed service there. Very robust, nicely engineered fully-managed service as we've already acknowledged, but an extremely expensive solution for what it is, which is really just a proxy in the end. It doesn't add any value to you.Corey: The only way to make that more expensive would be to route it through something like Splunk or whatnot. And Splunk does an awful lot for what they charge per gigabyte, but it just feels like it's rent-seeking in some of the worst ways possible. And what I love about this is that you've solved the problem in a way that is open-source, you have already released it in Terraform code. I think one of the first to-dos on this for someone is going to be, okay now also make it CloudFormation and also make it CDK so you can drop it in however you want.And anyone can use this. I think the biggest mistake people might make in glancing at this is well, I'm looking at the hourly charge for the NAT Gateways and that's 32-and-a-half bucks a month and the instances that you recommend are hundreds of dollars a month for the big network-optimized stuff. Yeah, if you care about the hourly rate of either of those two things, this is not for you. That is not the problem that it solves. If you're an independent learner annoyed about the $30 charge you got for a Managed NAT Gateway, don't do this. This will only add to your billing concerns.Where it really shines is once you're at, I would say probably about ten terabytes a month, give or take, in Managed NAT Gateway data processing is where it starts to consider this. The breakeven is around six or so but there is value to not having to think about things. Once you get to that level of spend, though it's worth devoting a little bit of infrastructure time to something like this.Ben: Yeah, that's effectively correct. The total cost of running the solution, like, all-in, there's eight Elastic IPs, four NAT Gateways, if you're—say you're four zones; could be less if you're in fewer zones—like, n NAT Gateways, n NAT instances, depending on how many zones you're in, and I think that's about it. And I said right in the documentation, if any of those baseline fees are a material number for your use case, then this is probably not the right solution. Because we're talking about saving thousands of dollars. Any of these small numbers for NAT Gateway hourly costs, NAT instance hourly costs, that shouldn't be a factor, basically.Corey: Yeah, it's like when I used to worry about costing my customers a few tens of dollars in Cost Explorer or CloudWatch or request fees against S3 for their Cost and Usage Reports. It's yeah, that does actually have a cost, there's no real way around it, but look at the savings they're realizing by going through that. Yeah, they're not going to come back and complaining about their five-figure consulting engagement costing an additional $25 in AWS charges and then lowering it by a third. So, there's definitely a difference as far as how those things tend to be perceived. But it's easy to miss the big stuff when chasing after the little stuff like that.This is part of the problem I have with an awful lot of cost tooling out there. They completely ignore cost components like this and focus only on the things that are easy to query via API, of, oh, we're going to cost-optimize your Kubernetes cluster when they think about compute and RAM. And, okay, that's great, but you're completely ignoring all the data transfer because there's still no great way to get at that programmatically. And it really is missing the forest for the trees.Ben: I think this is key to any cost reduction project or program that you're undertaking. When you look at a bill, look for the biggest spend items first and work your way down from there, just because of the impact you can have. And that's exactly what I did in this project. I saw that ‘EC2 Other' slash NAT Gateway was the big item and I started brainstorming ways that we could go about addressing that. And now I have my next targets in mind now that we've reduced this cost to effectively… nothing, extremely low compared to what it was, we have other new line items on our bill that we can start optimizing. But in any cost project, start with the big things.Corey: You have come a long way around to answer a question I get asked a lot, which is, “How do I become a cloud economist?” And my answer is, you don't. It's something that happens to you. And it appears to be happening to you, too. My favorite part about the solution that you built, incidentally, is that it is being released under the auspices of your employer, Chime Financial, which is immune to being acquired by Amazon just to kill this thing and shut it up.Because Amazon already has something shitty called Chime. They don't need to wind up launching something else or acquiring something else and ruining it because they have a Slack competitor of sorts called Amazon Chime. There's no way they could acquire you [unintelligible 00:27:45] going to get lost in the hallways.Ben: Well, I have confidence that Chime will be a good steward of the project. Chime's goal and mission as a company is to help everyone achieve financial peace of mind and we take that really seriously. We even apply it to ourselves and that was kind of the impetus behind developing this in the first place. You mentioned earlier we have Terraform support already and you're exactly right. I'd love to have CDK, CloudFormation, Pulumi supports, and other kinds of contributions are more than welcome from the community.So, if anybody feels like participating, if they see a feature that's missing, let's make this project the best that it can be. I suspect we can save many companies, hundreds of thousands or millions of dollars. And this really feels like the right direction to go in.Corey: This is easily a multi-billion dollar savings opportunity, globally.Ben: That's huge. I would be flabbergasted if that was the outcome of this.Corey: The hardest part is reaching these people and getting them on board with the idea of handling this. And again, I think there's a lot of opportunity for the project to evolve in the sense of different settings depending upon risk tolerance. I can easily see a scenario where in the event of a disruption to the NAT instance, it fails over to the Managed NAT Gateway, but fail back becomes manual so you don't have a flapping route table back and forth or a [hold 00:29:05] downtime or something like that. Because again, in that scenario, the failure mode is just well, you're paying four-and-a-half cents per gigabyte for a while until you wind up figuring out what's going on as opposed to the failure mode of you wind up disrupting connections on an ongoing basis, and for some workloads, that's not tenable. This is absolutely, for the common case, the right path forward.Ben: Absolutely. I think it's an enterprise-grade solution and the more knobs and dials that we add to tweak to make it more robust or adaptable to different kinds of use cases, the best outcome here would actually be that the entire solution becomes irrelevant because AWS fixes the NAT Gateway pricing. If that happens, I will consider the project a great success.Corey: I will be doing backflips like you wouldn't believe. I would sing their praises day in, day out. I'm not saying reduce it to nothing, even. I'm not saying it adds no value. I would change the way that it's priced because honestly, the fact that I can run an EC2 instance and be charged $0 on a per-gigabyte basis, yeah, I would pay a premium on an hourly charge based upon traffic volumes, but don't meter per gigabyte. That's where it breaks down.Ben: Absolutely. And why is it additive to data transfer, also? Like, I remember first starting to use VPC when it was launched and reading about the NAT instance requirement and thinking, “Wait a minute. I have to pay this extra management and hourly fee just so my private hosts could reach the internet? That seems kind of janky.”And Amazon established a norm here because Azure and GCP both have their own equivalent of this now. This is a business choice. This is not a technical choice. They could just run this under the hood and not charge anybody for it or build in the cost and it wouldn't be this thing we have to think about.Corey: I almost hate to say it, but Oracle Cloud does, for free.Ben: Do they?Corey: It can be done. This is a business decision. It is not a technical capability issue where well, it does incur cost to run these things. I understand that and I'm not asking for things for free. I very rarely say that this is overpriced when I'm talking about AWS billing issues. I'm talking about it being unpredictable, I'm talking about it being impossible to see in advance, but the fact that it costs too much money is rarely my complaint. In this case, it costs too much money. Make it cost less.Ben: If I'm not mistaken. GCPs equivalent solution is the exact same price. It's also four-and-a-half cents per gigabyte. So, that shows you that there's business games being played here. Like, Amazon could get ahead and do right by the customer by dropping this to a much more reasonable price.Corey: I really want to thank you both for taking the time to speak with me and building this glorious, glorious thing. Where can we find it? And where can we find you?Ben: alternat.cloud is going to be the place to visit. It's on Chime's GitHub, which will be released by the time this podcast comes out. As for me, if you want to connect, I'm on Twitter. @iamthewhaley is my handle. And of course, I'm on LinkedIn.Corey: Links to all of that will be in the podcast notes. Ben, thank you so much for your time and your hard work.Ben: This was fun. Thanks, Corey.Corey: Ben Whaley, staff software engineer at Chime Financial, and AWS Community Hero. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry rant of a comment that I will charge you not only four-and-a-half cents per word to read, but four-and-a-half cents to reply because I am experimenting myself with being a rent-seeking schmuck.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About KevinKevin Miller is currently the global General Manager for Amazon Simple Storage Service (S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. Prior to this role, Kevin has had multiple leadership roles within AWS, including as the General Manager for Amazon S3 Glacier, Director of Engineering for AWS Virtual Private Cloud, and engineering leader for AWS Virtual Private Network and AWS Direct Connect. Kevin was also Technical Advisor to the Senior Vice President for AWS Utility Computing. Kevin is a graduate of Carnegie Mellon University with a Bachelor of Science in Computer Science.Links Referenced: snark.cloud/shirt: https://snark.cloud/shirt aws.amazon.com/s3: https://aws.amazon.com/s3 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog is a SaaS monitoring and security platform that enables full-stack observability for modern infrastructure and applications at every scale. Datadog enables teams to see everything: dashboarding, alerting, application performance monitoring, infrastructure monitoring, UX monitoring, security monitoring, dog logos, and log management, in one tightly integrated platform. With 600-plus out-of-the-box integrations with technologies including all major cloud providers, databases, and web servers, Datadog allows you to aggregate all your data into one platform for seamless correlation, allowing teams to troubleshoot and collaborate together in one place, preventing downtime and enhancing performance and reliability. Get started with a free 14-day trial by visiting datadoghq.com/screaminginthecloud, and get a free t-shirt after installing the agent.Corey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming. That's GO M-O-M-E-N-T-O dot co slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Right now, as I record this, we have just kicked off our annual charity t-shirt fundraiser. This year's shirt showcases S3 as the eighth wonder of the world. And here to either defend or argue the point—we're not quite sure yet—is Kevin Miller, AWS's vice president and general manager for Amazon S3. Kevin, thank you for agreeing to suffer the slings and arrows that are no doubt going to be interpreted, misinterpreted, et cetera, for the next half hour or so.Kevin: Oh, Corey, thanks for having me. And happy to do that, and really flattered for you to be thinking about S3 in this way. So more than happy to chat with you.Corey: It's absolutely one of those services that is foundational to the cloud. It was the first AWS service that was put into general availability, although the beta folks are going to argue back and forth about no, no, that was SQS instead. I feel like now that Mai-Lan handles both SQS and S3 as part of her portfolio, she is now the final arbiter of that. I'm sure that's an argument for a future day. But it's impossible to imagine cloud without S3.Kevin: I definitely think that's true. It's hard to imagine cloud, actually, with many of our foundational services, including SQS, of course, but we are—yes, we were the first generally available service with S3. And pretty happy with our anniversary being Pi Day, 3/14.Corey: I'm also curious, your own personal trajectory has been not necessarily what folks would expect. You were the general manager of Amazon Glacier, and now you're the general manager and vice president of S3. So, I've got to ask, because there are conflicting reports on this depending upon what angle you look at, are Glacier and S3 the same thing?Kevin: Yes, I was the general manager for S3 Glacier prior to coming over to S3 proper, and the answer is no, they are not the same thing. We certainly have a number of technologies where we're able to use those technologies both on S3 and Glacier, but there are certainly a number of things that are very distinct about Glacier and give us that ability to hit the ultra-low price points that we do for Glacier Deep Archive being as low as $1 per terabyte-month. And so, that definitely—there's a lot of actual ingenuity up and down the stack, from hardware to software, everywhere in between, to really achieve that with Glacier. But then there's other spots where S3 and Glacier have very similar needs, and then, of course, today many customers use Glacier through S3 as a storage class in S3, and so that's a great way to do that. So, there's definitely a lot of shared code, but certainly, when you get into it, there's [unintelligible 00:04:59] to both of them.Corey: I ran a number of obnoxiously detailed financial analyses, and they all came away with, unless you have a very specific very nuanced understanding of your data lifecycle and/or it is less than 30 or 60 days depending upon a variety of different things, the default S3 storage class you should be using for virtually anything is Intelligent Tiering. That is my purely economic analysis of it. Do you agree with that? Disagree with that? And again, I understand that all of these storage classes are like your children, and I am inviting you to tell me which one of them is your favorite, but I'm absolutely prepared to do that.Kevin: Well, we love Intelligent Tiering because it is very simple; customers are able to automatically save money using Intelligent Tiering for data that's not being frequently accessed. And actually, since we launched it a few years ago, we've already saved customers more than $250 million using Intelligent Tiering. So, I would say today, it is our default recommendation in almost every case. I think that the cases where we would recommend another storage class as the primary storage class tend to be specific to the use case where—and particularly for use cases where customers really have a good understanding of the access patterns. And we saw some customers do for their certain dataset, they know that it's going to be heavily accessed for a fixed period of time, or this data is actually for archival, it'll never be accessed, or very rarely if ever access, just maybe in an emergency.And those kinds of use cases, I think actually, customers are probably best to choose one of the specific storage classes where they're, sort of, paying that the lower cost from day one. But again, I would say for the vast majority of cases that we see, the data access patterns are unpredictable and customers like the flexibility of being able to very quickly retrieve the data if they decide they need to use it. But in many cases, they'll save a lot of money as the data is not being accessed, and so, Intelligent Tiering is a great choice for those cases.Corey: I would take it a step further and say that even when customers believe that they are going to be doing a deeper analysis and they have a better understanding of their data flow patterns than Intelligent Tiering would, in practice, I see that they rarely do anything about it. It's one of those things where they're like, “Oh, yeah, we're going to set up our own lifecycle policies real soon now,” whereas, just switch it over to Intelligent Tiering and never think about it again. People's time is worth so much more than the infrastructure they're working on in almost every case. It doesn't seem to make a whole lot of sense unless you have a very intentioned, very urgent reason to go and do that stuff by hand in most cases.Kevin: Yeah, that's right. I think I agree with you, Corey. And certainly, that is the recommendation we lead with customers.Corey: In previous years, our charity t-shirt has focused on other areas of AWS, and one of them was based upon a joke that I've been telling for a while now, which is that the best database in the world is Route 53 and storing TXT records inside of it. I don't know if I ever mentioned this to you or not, but the first iteration of that joke was featuring around S3. The challenge that I had with it is that S3 Select is absolutely a thing where you can query S3 with SQL which I don't see people doing anymore because Athena is the easier, more, shall we say, well-articulated version of all of that. And no, no, that joke doesn't work because it's actually true. You can use S3 as a database. Does that statement fill you with dread? Regret? Am I misunderstanding something? Or are you effectively running a giant subversive database?Kevin: Well, I think that certainly when most customers think about a database, they think about a collection of technology that's applied for given problems, and so I wouldn't count S3 as providing the whole range of functionality that would really make up a database. But I think that certainly a lot of the primitives and S3 Select as a great example of a primitive are available in S3. And we're looking at adding, you know, additional primitives going forward to make it possible to, you know, to build a database around S3. And as you see, other AWS services have done that in many ways. For example, obviously with Amazon Redshift having a lot of capability now to just directly access and use data in S3 and make that a super seamless so that you can then run data warehousing type queries on top of S3 and on top of your other datasets.So, I certainly think it's a great building block. And one other thing I would actually just say that you may not know, Corey, is that one of the things over the last couple of years we've been doing a lot more with S3 is actually working to directly contribute improvements to open-source connector software that uses S3, to make available automatically some of the performance improvements that can be achieved either using both the AWS SDK, and also using things like S3 Select. So, we started with a few of those things with Select; you're going to see more of that coming, most likely. And some of that, again, the idea there as you may not even necessarily know you're using Select, but when we can identify that it will improve performance, we're looking to be able to contribute those kinds of improvements directly—or we are contributing those directly to those open-source packages. So, one thing I would definitely recommend customers and developers do is have a capability of sort of keeping that software up-to-date because although it might seem like those are sort of one-and-done kind of software integrations, there's actually almost continuous improvement now going on, and around things like that capability, and then others we come out with.Corey: What surprised me is just how broadly S3 has been adopted by a wide variety of different clients' software packages out there. Back when I was running production environments in anger, I distinctly remember in one Ubuntu environment, we wound up installing a specific package that was designed to teach apt how to retrieve packages and its updates from S3, which was awesome. I don't see that anymore, just because it seems that it is so easy to do it now, just with the native features that S3 offers, as well as an awful lot of software under the hood has learned to directly recognize S3 as its own thing, and can react accordingly.Kevin: And just do the right thing. Exactly. No, we certainly see a lot of that. So that's, you know—I mean, obviously making that simple for end customers to use and achieve what they're trying to do, that's the whole goal.Corey: It's always odd to me when I'm talking to one of my clients who is looking to understand and optimize their AWS bill to see outliers in either direction when it comes to S3 itself. When they're driving large S3 bills as in a majority of their spend, it's, okay, that is very interesting. Let's dive into that. But almost more interesting to me is when it is effectively not being used at all. When, oh, we're doing everything with EBS volumes or EFS.And again, those are fine services. I don't have any particular problem with them anymore, but the problem I have is that the cloud long ago took what amounts to an economic vote. There's a tax savings for storing data in an object store the way that you—and by extension, most of your competitors—wind up pricing this, versus the idea of on a volume basis where you have to pre-provision things, you don't get any form of durability that extends beyond the availability zone boundary. It just becomes an awful lot of, “Well, you could do it this way. But it gets really expensive really quickly.”It just feels wild to me that there is that level of variance between S3 just sort of raw storage basis, economically, as well as then just the, frankly, ridiculous levels of durability and availability that you offer on top of that. How did you get there? Was the service just mispriced at the beginning? Like oh, we dropped to zero and probably should have put that in there somewhere.Kevin: Well, no, I wouldn't call it mispriced. I think that the S3 came about when we took a—we spent a lot of time looking at the architecture for storage systems, and knowing that we wanted a system that would provide the durability that comes with having three completely independent data centers and the elasticity and capability where, you know, customers don't have to provision the amount of storage they want, they can simply put data and the system keeps growing. And they can also delete data and stop paying for that storage when they're not using it. And so, just all of that investment and sort of looking at that architecture holistically led us down the path to where we are with S3.And we've definitely talked about this. In fact, in Peter's keynote at re:Invent last year, we talked a little bit about how the system is designed under the hood, and one of the thing you realize is that S3 gets a lot of the benefits that we do by just the overall scale. The fact that it is—I think the stat is that at this point more than 10,000 customers have data that's stored on more than a million hard drives in S3. And that's how you get the scale and the capability to do is through massive parallelization. Where customers that are, you know, I would say building more traditional architectures, those are inherently typically much more siloed architectures with a relatively small-scale overall, and it ends up with a lot of resource that's provisioned at small-scale in sort of small chunks with each resource, that you never get to that scale where you can start to take advantage of the some is more than the greater of the parts.And so, I think that's what the recognition was when we started out building S3. And then, of course, we offer that as an API on top of that, where customers can consume whatever they want. That is, I think, where S3, at the scale it operates, is able to do certain things, including on the economics, that are very difficult or even impossible to do at a much smaller scale.Corey: One of the more egregious clown-shoe statements that I hear from time to time has been when people will come to me and say, “We've built a competitor to S3.” And my response is always one of those, “Oh, this should be good.” Because when people say that, they generally tend to be focusing on one or maybe two dimensions that doesn't work for a particular use case as well as it could. “Okay, what was your story around why this should be compared to S3?” “Well, it's an object store. It has full S3 API compatibility.” “Does it really because I have to say, there are times where I'm not entirely convinced that S3 itself has full compatibility with the way that its API has been documented.”And there's an awful lot of magic that goes into this too. “Okay, great. You're running an S3 competitor. Great. How many buildings does it live in?” Like, “Well, we have a problem with the s at the end of that word.” It's, “Okay, great. If it fits on my desk, it is not a viable S3 competitor. If it fits in a single zip code, it is probably not a viable S3 competitor.” Now, can it be an object store? Absolutely. Does it provide a new interface to some existing data someone might have? Sure why not. But I think that, oh, it's S3 compatible, is something that gets tossed around far too lightly by folks who don't really understand what it is that drives S3 and makes it special.Kevin: Yeah, I mean, I would say certainly, there's a number of other implementations of the S3 API, and frankly we're flattered that customers recognize and our competitors and others recognize the simplicity of the API and go about implementing it. But to your point, I think that there's a lot more; it's not just about the API, it's really around everything surrounding S3 from, as you mentioned, the fact that the data in S3 is stored in three independent availability zones, all of which that are separated by kilometers from each other, and the resilience, the automatic failover, and the ability to withstand an unlikely impact to one of those facilities, as well as the scalability, and you know, the fact that we put a lot of time and effort into making sure that the service continues scaling with our customers need. And so, I think there's a lot more that goes into what is S3. And oftentimes just in a straight-up comparison, it's sort of purely based on just the APIs and generally a small set of APIs, in addition to those intangibles around—or not intangibles, but all of the ‘-ilities,' right, the elasticity and the durability, and so forth that I just talked about. In addition to all that also, you know, certainly what we're seeing for customers is as they get into the petabyte and tens of petabytes, hundreds of petabytes scale, their need for the services that we provide to manage that storage, whether it's lifecycle and replication, or things like our batch operations to help update and to maintain all the storage, those become really essential to customers wrapping their arms around it, as well as visibility, things like Storage Lens to understand, what storage do I have? Who's using it? How is it being used?And those are all things that we provide to help customers manage at scale. And certainly, you know, oftentimes when I see claims around S3 compatibility, a lot of those advanced features are nowhere to be seen.Corey: I also want to call out that a few years ago, Mai-Lan got on stage and talked about how, to my recollection, you folks have effectively rebuilt S3 under the hood into I think it was 235 distinct microservices at the time. There will not be a quiz on numbers later, I'm assuming. But what was wild to me about that is having done that for services that are orders of magnitude less complex, it absolutely is like changing the engine on a car without ever slowing down on the highway. Customers didn't know that any of this was happening until she got on stage and announced it. That is wild to me. I would have said before this happened that there was no way that would have been possible except it clearly was. I have to ask, how did you do that in the broad sense?Kevin: Well, it's true. A lot of the underlying infrastructure that's been part of S3, both hardware and software is, you know, you wouldn't—if someone from S3 in 2006 came and looked at the system today, they would probably be very disoriented in terms of understanding what was there because so much of it has changed. To answer your question, the long and short of it is a lot of testing. In fact, a lot of novel testing most recently, particularly with the use of formal logic and what we call automated reasoning. It's also something we've talked a fair bit about in re:Invent.And that is essentially where you prove the correctness of certain algorithms. And we've used that to spot some very interesting, the one-in-a-trillion type cases that S3 scale happens regularly, that you have to be ready for and you have to know how the system reacts, even in all those cases. I mean, I think one of our engineers did some calculations that, you know, the number of potential states for S3, sort of, exceeds the number of atoms in the universe or something so crazy. But yet, using methods like automated reasoning, we can test that state space, we can understand what the system will do, and have a lot of confidence as we begin to swap, you know, pieces of the system.And of course, nothing in S3 scale happens instantly. It's all, you know, I would say that for a typical engineering effort within S3, there's a certain amount of effort, obviously, in making the change or in preparing the new software, writing the new software and testing it, but there's almost an equal amount of time that goes into, okay, and what is the process for migrating from System A to System B, and that happens over a timescale of months, if not years, in some cases. And so, there's just a lot of diligence that goes into not just the new systems, but also the process of, you know, literally, how do I swap that engine on the system. So, you know, it's a lot of really hard working engineers that spent a lot of time working through these details every day.Corey: I still view S3 through the lens of it is one of the easiest ways in the world to wind up building a static web server because you basically stuff the website files into a bucket and then you check a box. So, it feels on some level though, that it is about as accurate as saying that S3 is a database. It can be used or misused or pressed into service in a whole bunch of different use cases. What have you seen from customers that has, I guess, taught you something you didn't expect to learn about your own service?Kevin: Oh, I'd say we have those [laugh] meetings pretty regularly when customers build their workloads and have unique patterns to it, whether it's the type of data they're retrieving and the access pattern on the data. You know, for example, some customers will make heavy use of our ability to do [ranged gets 00:22:47] on files and [unintelligible 00:22:48] objects. And that's pretty good capability, but that can be one where that's very much dependent on the type of file, right, certain files have structure, as far as you know, a header or footer, and that data is being accessed in a certain order. Oftentimes, those may also be multi-part objects, and so making use of the multi-part features to upload different chunks of a file in parallel. And you know, also certainly when customers get into things like our batch operations capability where they can literally write a Lambda function and do what they want, you know, we've seen some pretty interesting use cases where customers are running large-scale operations across, you know, billions, sometimes tens of billions of objects, and this can be pretty interesting as far as what they're able to do with them.So, for something is sort of what you might—you know, as simple and basics, in some sense, of GET and PUT API, just all the capability around it ends up being pretty interesting as far as how customers apply it and the different workloads they run on it.Corey: So, if you squint hard enough, what I'm hearing you tell me is that I can view all of this as, “Oh, yeah. S3 is also compute.” And it feels like that as a fast-track to getting a question wrong on one of the certification exams. But I have to ask, from your point of view, is S3 storage? And whether it's yes or no, what gets you excited about the space that it's in?Kevin: Yeah well, I would say S3 is not compute, but we have some great compute services that are very well integrated with S3, which excites me as well as we have things like S3 Object Lambda, where we actually handle that integration with Lambda. So, you're writing Lambda functions, we're executing them on the GET path. And so, that's a pretty exciting feature for me. But you know, to sort of take a step back, what excites me is I think that customers around the world, in every industry, are really starting to recognize the value of data and data at large scale. You know, I think that actually many customers in the world have terabytes or more of data that sort of flows through their fingers every day that they don't even realize.And so, as customers realize what data they have, and they can capture and then start to analyze and make ultimately make better business decisions that really help drive their top line or help them reduce costs, improve costs on whether it's manufacturing or, you know, other things that they're doing. That's what really excites me is seeing those customers take the raw capability and then apply it to really just to transform how they not just how their business works, but even how they think about the business. Because in many cases, transformation is not just a technical transformation, it's people and cultural transformation inside these organizations. And that's pretty cool to see as it unfolds.Corey: One of the more interesting things that I've seen customers misunderstand, on some level, has been a number of S3 releases that focus around, “Oh, this is for your data lake.” And I've asked customers about that. “So, what's your data lake strategy?” “Well, we don't have one of those.” “You have, like, eight petabytes and climbing in S3? What do you call that?” It's like, “Oh, yeah, that's just a bunch of buckets we dump things into. Some are logs of our assets and the rest.” It's—Kevin: Right.Corey: Yeah, it feels like no one thinks of themselves as having anything remotely resembling a structured place for all of the data that accumulates at a company.Kevin: Mm-hm.Corey: There is an evolution of people learning that oh, yeah, this is in fact, what it is that we're doing, and this thing that they're talking about does apply to us. But it almost feels like a customer communication challenge, just because, I don't know about you, but with my legacy AWS account, I have dozens of buckets in there that I don't remember what the heck they're for. Fortunately, you folks don't charge by the bucket, so I can smile, nod, remain blissfully ignorant, but it does make me wonder from time to time.Kevin: Yeah, no, I think that what you hear there is actually pretty consistent with what the reality is for a lot of customers, which is in distributed organizations, I think that's bound to happen, you have different teams that are working to solve problems, and they are collecting data to analyze, they're creating result datasets and they're storing those datasets. And then, of course, priorities can shift, and you know, and there's not necessarily the day-to-day management around data that we might think would be expected. I feel [we 00:26:56] sort of drew an architecture on a whiteboard. And so, I think that's the reality we are in. And we will be in, largely forever.I mean, I think that at a smaller-scale, that's been happening for years. So, I think that, one, I think that there's a lot of capability just being in the cloud. At the very least, you can now start to wrap your arms around it, right, where used to be that it wasn't even possible to understand what all that data was because there's no way to centrally inventory it well. In AWS with S3, with inventory reports, you can get a list of all your storage and we are going to continue to add capability to help customers get their arms around what they have, first off; understand how it's being used—that's where things like Storage Lens really play a big role in understanding exactly what data is being accessed and not. We're definitely listening to customers carefully around this, and I think when you think about broader data management story, I think that's a place that we're spending a lot of time thinking right now about how do we help customers get their arms around it, make sure that they know what's the categorization of certain data, do I have some PII lurking here that I need to be very mindful of?And then how do I get to a world where I'm—you know, I won't say that it's ever going to look like the perfect whiteboard picture you might draw on the wall. I don't think that's really ever achievable, but I think certainly getting to a point where customers have a real solid understanding of what data they have and that the right controls are in place around all that data, yeah, I think that's directionally where I see us heading.Corey: As you look around how far the service has come, it feels like, on some level, that there were some, I guess, I don't want to say missteps, but things that you learned as you went along. Like, back when the service was in beta, for example, there was no per-request charge. To my understanding that was changed, in part because people were trying to use it as a file system, and wow, that suddenly caused a tremendous amount of load on some of the underlying systems. You originally launched with a BitTorrent endpoint as an option so that people could download through peer-to-peer approaches for large datasets and turned out that wasn't really the way the internet evolved, either. And I'm curious, if you were to have to somehow build this off from scratch, are there any other significant changes you would make in how the service was presented to customers in how people talked about it in the early days? Effectively given a mulligan, what would you do differently?Kevin: Well, I don't know, Corey, I mean, just given where it's grown to in macro terms, you know, I definitely would be worried taking a mulligan, you know, that I [laugh] would change the sort of the overarching trajectory. Certainly, I think there's a few features here and there where, for whatever reason, it was exciting at the time and really spoke to what customers at the time were thinking, but over time, you know, sort of quickly those needs move to something a little bit different. And, you know, like you said things like the BitTorrent support is one where, at some level, it seems like a great technical architecture for the internet, but certainly not something that we've seen dominate in the way things are done. Instead, you know, we've largely kind of have a world where there's a lot of caching layers, but it still ends up being largely client-server kind of connections. So, I don't think I would do a—I certainly wouldn't do a mulligan on any of the major functionality, and I think, you know, there's a few things in the details where obviously, we've learned what really works in the end. I think we learned that we wanted bucket names to really strictly conform to rules for DNS encoding. So, that was the change that was made at some point. And we would tweak that, but no major changes, certainly.Corey: One subject of some debate while we were designing this year's charity t-shirt—which, incidentally, if you're listening to this, you can pick up for yourself at snark.cloud/shirt—was the is S3 itself dependent upon S3? Because we know that every other service out there is as well, but it is interesting to come up with an idea of, “Oh, yeah. We're going to launch a whole new isolated region of S3 without S3 to lean on.” That feels like it's an almost impossible bootstrapping problem.Kevin: Well, S3 is not dependent on S3 to come up, and it's certainly a critical dependency tree that we look at and we track and make sure that we'd like to have an acyclic graph as we look at dependencies.Corey: That is such a sophisticated way to say what I learned the hard way when I was significantly younger and working in production environments: don't put the DNS servers needed to boot the hypervisor into VMs that require a working hypervisor. It's one of those oh, yeah, in hindsight, that makes perfect sense, but you learn it right after that knowledge really would have been useful.Kevin: Yeah, absolutely. And one of the terms we use for that, as well as is the idea of static stability, or that's one of the techniques that can really help with isolating a dependency is what we call static stability. We actually have an article about that in the Amazon Builder Library, which there's actually a bunch of really good articles in there from very experienced operations-focused engineers in AWS. So, static stability is one of those key techniques, but other techniques—I mean, just pure minimization of dependencies is one. And so, we were very, very thoughtful about that, particularly for that core layer.I mean, you know, when you talk about S3 with 200-plus microservices, or 235-plus microservices, I would say not all of those services are critical for every single request. Certainly, a small subset of those are required for every request, and then other services actually help manage and scale the kind of that inner core of services. And so, we look at dependencies on a service by service basis to really make sure that inner core is as minimized as possible. And then the outer layers can start to take some dependencies once you have that basic functionality up.Corey: I really want to thank you for being as generous with your time as you have been. If people want to learn more about you and about S3 itself, where should they go—after buying a t-shirt, of course.Kevin: Well, certainly buy the t-shirt. First, I love the t-shirts and the charity that you work with to do that. Obviously, for S3, it's aws.amazon.com/s3. And you can actually learn more about me. I have some YouTube videos, so you can search for me on YouTube and kind of get a sense of myself.Corey: We will put links to that into the show notes, of course. Thank you so much for being so generous with your time. I appreciate it.Kevin: Absolutely. Yeah. Glad to spend some time. Thanks for the questions, Corey.Corey: Kevin Miller, vice president and general manager for Amazon S3. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, ignorant comment talking about how your S3 compatible service is going to blow everyone's socks off when it fails.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
[00:00.000 --> 00:04.560] All right, so I'm here with 52 weeks of AWS[00:04.560 --> 00:07.920] and still continuing to do developer certification.[00:07.920 --> 00:11.280] I'm gonna go ahead and share my screen here.[00:13.720 --> 00:18.720] All right, so we are on Lambda, one of my favorite topics.[00:19.200 --> 00:20.800] Let's get right into it[00:20.800 --> 00:24.040] and talk about how to develop event-driven solutions[00:24.040 --> 00:25.560] with AWS Lambda.[00:26.640 --> 00:29.440] With Serverless Computing, one of the things[00:29.440 --> 00:32.920] that it is going to do is it's gonna change[00:32.920 --> 00:36.000] the way you think about building software[00:36.000 --> 00:39.000] and in a traditional deployment environment,[00:39.000 --> 00:42.040] you would configure an instance, you would update an OS,[00:42.040 --> 00:45.520] you'd install applications, build and deploy them,[00:45.520 --> 00:47.000] load balance.[00:47.000 --> 00:51.400] So this is non-cloud native computing and Serverless,[00:51.400 --> 00:54.040] you really only need to focus on building[00:54.040 --> 00:56.360] and deploying applications and then monitoring[00:56.360 --> 00:58.240] and maintaining the applications.[00:58.240 --> 01:00.680] And so with really what Serverless does[01:00.680 --> 01:05.680] is it allows you to focus on the code for the application[01:06.320 --> 01:08.000] and you don't have to manage the operating system,[01:08.000 --> 01:12.160] the servers or scale it and really is a huge advantage[01:12.160 --> 01:14.920] because you don't have to pay for the infrastructure[01:14.920 --> 01:15.920] when the code isn't running.[01:15.920 --> 01:18.040] And that's really a key takeaway.[01:19.080 --> 01:22.760] If you take a look at the AWS Serverless platform,[01:22.760 --> 01:24.840] there's a bunch of fully managed services[01:24.840 --> 01:26.800] that are tightly integrated with Lambda.[01:26.800 --> 01:28.880] And so this is another huge advantage of Lambda,[01:28.880 --> 01:31.000] isn't necessarily that it's the fastest[01:31.000 --> 01:33.640] or it has the most powerful execution,[01:33.640 --> 01:35.680] it's the tight integration with the rest[01:35.680 --> 01:39.320] of the AWS platform and developer tools[01:39.320 --> 01:43.400] like AWS Serverless application model or AWS SAM[01:43.400 --> 01:45.440] would help you simplify the deployment[01:45.440 --> 01:47.520] of Serverless applications.[01:47.520 --> 01:51.960] And some of the services include Amazon S3,[01:51.960 --> 01:56.960] Amazon SNS, Amazon SQS and AWS SDKs.[01:58.600 --> 02:03.280] So in terms of Lambda, AWS Lambda is a compute service[02:03.280 --> 02:05.680] for Serverless and it lets you run code[02:05.680 --> 02:08.360] without provisioning or managing servers.[02:08.360 --> 02:11.640] It allows you to trigger your code in response to events[02:11.640 --> 02:14.840] that you would configure like, for example,[02:14.840 --> 02:19.200] dropping something into a S3 bucket like that's an image,[02:19.200 --> 02:22.200] Nevel Lambda that transcribes it to a different format.[02:23.080 --> 02:27.200] It also allows you to scale automatically based on demand[02:27.200 --> 02:29.880] and it will also incorporate built-in monitoring[02:29.880 --> 02:32.880] and logging with AWS CloudWatch.[02:34.640 --> 02:37.200] So if you look at AWS Lambda,[02:37.200 --> 02:39.040] some of the things that it does[02:39.040 --> 02:42.600] is it enables you to bring in your own code.[02:42.600 --> 02:45.280] So the code you write for Lambda isn't written[02:45.280 --> 02:49.560] in a new language, you can write things[02:49.560 --> 02:52.600] in tons of different languages for AWS Lambda,[02:52.600 --> 02:57.600] Node, Java, Python, C-sharp, Go, Ruby.[02:57.880 --> 02:59.440] There's also custom run time.[02:59.440 --> 03:03.880] So you could do Rust or Swift or something like that.[03:03.880 --> 03:06.080] And it also integrates very deeply[03:06.080 --> 03:11.200] with other AWS services and you can invoke[03:11.200 --> 03:13.360] third-party applications as well.[03:13.360 --> 03:18.080] It also has a very flexible resource and concurrency model.[03:18.080 --> 03:20.600] And so Lambda would scale in response to events.[03:20.600 --> 03:22.880] So you would just need to configure memory settings[03:22.880 --> 03:24.960] and AWS would handle the other details[03:24.960 --> 03:28.720] like the CPU, the network, the IO throughput.[03:28.720 --> 03:31.400] Also, you can use the Lambda,[03:31.400 --> 03:35.000] AWS Identity and Access Management Service or IAM[03:35.000 --> 03:38.560] to grant access to what other resources you would need.[03:38.560 --> 03:41.200] And this is one of the ways that you would control[03:41.200 --> 03:44.720] the security of Lambda is you have really guardrails[03:44.720 --> 03:47.000] around it because you would just tell Lambda,[03:47.000 --> 03:50.080] you have a role that is whatever it is you need Lambda to do,[03:50.080 --> 03:52.200] talk to SQS or talk to S3,[03:52.200 --> 03:55.240] and it would specifically only do that role.[03:55.240 --> 04:00.240] And the other thing about Lambda is that it has built-in[04:00.560 --> 04:02.360] availability and fault tolerance.[04:02.360 --> 04:04.440] So again, it's a fully managed service,[04:04.440 --> 04:07.520] it's high availability and you don't have to do anything[04:07.520 --> 04:08.920] at all to use that.[04:08.920 --> 04:11.600] And one of the biggest things about Lambda[04:11.600 --> 04:15.000] is that you only pay for what you use.[04:15.000 --> 04:18.120] And so when the Lambda service is idle,[04:18.120 --> 04:19.480] you don't have to actually pay for that[04:19.480 --> 04:21.440] versus if it's something else,[04:21.440 --> 04:25.240] like even in the case of a Kubernetes-based system,[04:25.240 --> 04:28.920] still there's a host machine that's running Kubernetes[04:28.920 --> 04:31.640] and you have to actually pay for that.[04:31.640 --> 04:34.520] So one of the ways that you can think about Lambda[04:34.520 --> 04:38.040] is that there's a bunch of different use cases for it.[04:38.040 --> 04:40.560] So let's start off with different use cases,[04:40.560 --> 04:42.920] web apps, I think would be one of the better ones[04:42.920 --> 04:43.880] to think about.[04:43.880 --> 04:46.680] So you can combine AWS Lambda with other services[04:46.680 --> 04:49.000] and you can build powerful web apps[04:49.000 --> 04:51.520] that automatically scale up and down.[04:51.520 --> 04:54.000] And there's no administrative effort at all.[04:54.000 --> 04:55.160] There's no backups necessary,[04:55.160 --> 04:58.320] no multi-data center redundancy, it's done for you.[04:58.320 --> 05:01.400] Backends, so you can build serverless backends[05:01.400 --> 05:05.680] that lets you handle web, mobile, IoT,[05:05.680 --> 05:07.760] third-party applications.[05:07.760 --> 05:10.600] You can also build those backends with Lambda,[05:10.600 --> 05:15.400] with API Gateway, and you can build applications with them.[05:15.400 --> 05:17.200] In terms of data processing,[05:17.200 --> 05:19.840] you can also use Lambda to run code[05:19.840 --> 05:22.560] in response to a trigger, change in data,[05:22.560 --> 05:24.440] shift in system state,[05:24.440 --> 05:27.360] and really all of AWS for the most part[05:27.360 --> 05:29.280] is able to be orchestrated with Lambda.[05:29.280 --> 05:31.800] So it's really like a glue type service[05:31.800 --> 05:32.840] that you're able to use.[05:32.840 --> 05:36.600] Now chatbots, that's another great use case for it.[05:36.600 --> 05:40.760] Amazon Lex is a service for building conversational chatbots[05:42.120 --> 05:43.560] and you could use it with Lambda.[05:43.560 --> 05:48.560] Amazon Lambda service is also able to be used[05:50.080 --> 05:52.840] with voice IT automation.[05:52.840 --> 05:55.760] These are all great use cases for Lambda.[05:55.760 --> 05:57.680] In fact, I would say it's kind of like[05:57.680 --> 06:01.160] the go-to automation tool for AWS.[06:01.160 --> 06:04.160] So let's talk about how Lambda works next.[06:04.160 --> 06:06.080] So the way Lambda works is that[06:06.080 --> 06:09.080] there's a function and there's an event source,[06:09.080 --> 06:10.920] and these are the core components.[06:10.920 --> 06:14.200] The event source is the entity that publishes events[06:14.200 --> 06:19.000] to AWS Lambda, and Lambda function is the code[06:19.000 --> 06:21.960] that you're gonna use to process the event.[06:21.960 --> 06:25.400] And AWS Lambda would run that Lambda function[06:25.400 --> 06:29.600] on your behalf, and a few things to consider[06:29.600 --> 06:33.840] is that it really is just a little bit of code,[06:33.840 --> 06:35.160] and you can configure the triggers[06:35.160 --> 06:39.720] to invoke a function in response to resource lifecycle events,[06:39.720 --> 06:43.680] like for example, responding to incoming HTTP,[06:43.680 --> 06:47.080] consuming events from a queue, like in the case of SQS[06:47.080 --> 06:48.320] or running it on a schedule.[06:48.320 --> 06:49.760] So running it on a schedule is actually[06:49.760 --> 06:51.480] a really good data engineering task, right?[06:51.480 --> 06:54.160] Like you could run it periodically to scrape a website.[06:55.120 --> 06:58.080] So as a developer, when you create Lambda functions[06:58.080 --> 07:01.400] that are managed by the AWS Lambda service,[07:01.400 --> 07:03.680] you can define the permissions for the function[07:03.680 --> 07:06.560] and basically specify what are the events[07:06.560 --> 07:08.520] that would actually trigger it.[07:08.520 --> 07:11.000] You can also create a deployment package[07:11.000 --> 07:12.920] that includes application code[07:12.920 --> 07:17.000] in any dependency or library necessary to run the code,[07:17.000 --> 07:19.200] and you can also configure things like the memory,[07:19.200 --> 07:23.200] you can figure the timeout, also configure the concurrency,[07:23.200 --> 07:25.160] and then when your function is invoked,[07:25.160 --> 07:27.640] Lambda will provide a runtime environment[07:27.640 --> 07:30.080] based on the runtime and configuration options[07:30.080 --> 07:31.080] that you selected.[07:31.080 --> 07:36.080] So let's talk about models for invoking Lambda functions.[07:36.360 --> 07:41.360] In the case of an event source that invokes Lambda function[07:41.440 --> 07:43.640] by either a push or a pool model,[07:43.640 --> 07:45.920] in the case of a push, it would be an event source[07:45.920 --> 07:48.440] directly invoking the Lambda function[07:48.440 --> 07:49.840] when the event occurs.[07:50.720 --> 07:53.040] In the case of a pool model,[07:53.040 --> 07:56.960] this would be putting the information into a stream or a queue,[07:56.960 --> 07:59.400] and then Lambda would pull that stream or queue,[07:59.400 --> 08:02.800] and then invoke the function when it detects an events.[08:04.080 --> 08:06.480] So a few different examples would be[08:06.480 --> 08:11.280] that some services can actually invoke the function directly.[08:11.280 --> 08:13.680] So for a synchronous invocation,[08:13.680 --> 08:15.480] the other service would wait for the response[08:15.480 --> 08:16.320] from the function.[08:16.320 --> 08:20.680] So a good example would be in the case of Amazon API Gateway,[08:20.680 --> 08:24.800] which would be the REST-based service in front.[08:24.800 --> 08:28.320] In this case, when a client makes a request to your API,[08:28.320 --> 08:31.200] that client would get a response immediately.[08:31.200 --> 08:32.320] And then with this model,[08:32.320 --> 08:34.880] there's no built-in retry in Lambda.[08:34.880 --> 08:38.040] Examples of this would be Elastic Load Balancing,[08:38.040 --> 08:42.800] Amazon Cognito, Amazon Lex, Amazon Alexa,[08:42.800 --> 08:46.360] Amazon API Gateway, AWS CloudFormation,[08:46.360 --> 08:48.880] and Amazon CloudFront,[08:48.880 --> 08:53.040] and also Amazon Kinesis Data Firehose.[08:53.040 --> 08:56.760] For asynchronous invocation, AWS Lambda queues,[08:56.760 --> 09:00.320] the event before it passes to your function.[09:00.320 --> 09:02.760] The other service gets a success response[09:02.760 --> 09:04.920] as soon as the event is queued,[09:04.920 --> 09:06.560] and if an error occurs,[09:06.560 --> 09:09.760] Lambda will automatically retry the invocation twice.[09:10.760 --> 09:14.520] A good example of this would be S3, SNS,[09:14.520 --> 09:17.720] SES, the Simple Email Service,[09:17.720 --> 09:21.120] AWS CloudFormation, Amazon CloudWatch Logs,[09:21.120 --> 09:25.400] CloudWatch Events, AWS CodeCommit, and AWS Config.[09:25.400 --> 09:28.280] But in both cases, you can invoke a Lambda function[09:28.280 --> 09:30.000] using the invoke operation,[09:30.000 --> 09:32.720] and you can specify the invocation type[09:32.720 --> 09:35.440] as either synchronous or asynchronous.[09:35.440 --> 09:38.760] And when you use the AWS service as a trigger,[09:38.760 --> 09:42.280] the invocation type is predetermined for each service,[09:42.280 --> 09:44.920] and so you have no control over the invocation type[09:44.920 --> 09:48.920] that these events sources use when they invoke your Lambda.[09:50.800 --> 09:52.120] In the polling model,[09:52.120 --> 09:55.720] the event sources will put information into a stream or a queue,[09:55.720 --> 09:59.360] and AWS Lambda will pull the stream or the queue.[09:59.360 --> 10:01.000] If it first finds a record,[10:01.000 --> 10:03.280] it will deliver the payload and invoke the function.[10:03.280 --> 10:04.920] And this model, the Lambda itself,[10:04.920 --> 10:07.920] is basically pulling data from a stream or a queue[10:07.920 --> 10:10.280] for processing by the Lambda function.[10:10.280 --> 10:12.640] Some examples would be a stream-based event service[10:12.640 --> 10:17.640] would be Amazon DynamoDB or Amazon Kinesis Data Streams,[10:17.800 --> 10:20.920] and these stream records are organized into shards.[10:20.920 --> 10:24.640] So Lambda would actually pull the stream for the record[10:24.640 --> 10:27.120] and then attempt to invoke the function.[10:27.120 --> 10:28.800] If there's a failure,[10:28.800 --> 10:31.480] AWS Lambda won't read any of the new shards[10:31.480 --> 10:34.840] until the failed batch of records expires or is processed[10:34.840 --> 10:36.160] successfully.[10:36.160 --> 10:39.840] In the non-streaming event, which would be SQS,[10:39.840 --> 10:42.400] Amazon would pull the queue for records.[10:42.400 --> 10:44.600] If it fails or times out,[10:44.600 --> 10:46.640] then the message would be returned to the queue,[10:46.640 --> 10:49.320] and then Lambda will keep retrying the failed message[10:49.320 --> 10:51.800] until it's processed successfully.[10:51.800 --> 10:53.600] If the message will expire,[10:53.600 --> 10:56.440] which is something you can do with SQS,[10:56.440 --> 10:58.240] then it'll just be discarded.[10:58.240 --> 11:00.400] And you can create a mapping between an event source[11:00.400 --> 11:02.960] and a Lambda function right inside of the console.[11:02.960 --> 11:05.520] And this is how typically you would set that up manually[11:05.520 --> 11:07.600] without using infrastructure as code.[11:08.560 --> 11:10.200] All right, let's talk about permissions.[11:10.200 --> 11:13.080] This is definitely an easy place to get tripped up[11:13.080 --> 11:15.760] when you're first using AWS Lambda.[11:15.760 --> 11:17.840] There's two types of permissions.[11:17.840 --> 11:20.120] The first is the event source and permission[11:20.120 --> 11:22.320] to trigger the Lambda function.[11:22.320 --> 11:24.480] This would be the invocation permission.[11:24.480 --> 11:26.440] And the next one would be the Lambda function[11:26.440 --> 11:29.600] needs permissions to interact with other services,[11:29.600 --> 11:31.280] but this would be the run permissions.[11:31.280 --> 11:34.520] And these are both handled via the IAM service[11:34.520 --> 11:38.120] or the AWS identity and access management service.[11:38.120 --> 11:43.120] So the IAM resource policy would tell the Lambda service[11:43.600 --> 11:46.640] which push event the sources have permission[11:46.640 --> 11:48.560] to invoke the Lambda function.[11:48.560 --> 11:51.120] And these resource policies would make it easy[11:51.120 --> 11:55.280] to grant access to a Lambda function across AWS account.[11:55.280 --> 11:58.400] So a good example would be if you have an S3 bucket[11:58.400 --> 12:01.400] in your account and you need to invoke a function[12:01.400 --> 12:03.880] in another account, you could create a resource policy[12:03.880 --> 12:07.120] that allows those to interact with each other.[12:07.120 --> 12:09.200] And the resource policy for a Lambda function[12:09.200 --> 12:11.200] is called a function policy.[12:11.200 --> 12:14.160] And when you add a trigger to your Lambda function[12:14.160 --> 12:16.760] from the console, the function policy[12:16.760 --> 12:18.680] will be generated automatically[12:18.680 --> 12:20.040] and it allows the event source[12:20.040 --> 12:22.820] to take the Lambda invoke function action.[12:24.400 --> 12:27.320] So a good example would be in Amazon S3 permission[12:27.320 --> 12:32.120] to invoke the Lambda function called my first function.[12:32.120 --> 12:34.720] And basically it would be an effect allow.[12:34.720 --> 12:36.880] And then under principle, if you would have service[12:36.880 --> 12:41.880] S3.AmazonEWS.com, the action would be Lambda colon[12:41.880 --> 12:45.400] invoke function and then the resource would be the name[12:45.400 --> 12:49.120] or the ARN of actually the Lambda.[12:49.120 --> 12:53.080] And then the condition would be actually the ARN of the bucket.[12:54.400 --> 12:56.720] And really that's it in a nutshell.[12:57.560 --> 13:01.480] The Lambda execution role grants your Lambda function[13:01.480 --> 13:05.040] permission to access AWS services and resources.[13:05.040 --> 13:08.000] And you select or create the execution role[13:08.000 --> 13:10.000] when you create a Lambda function.[13:10.000 --> 13:12.320] The IAM policy would define the actions[13:12.320 --> 13:14.440] of Lambda functions allowed to take[13:14.440 --> 13:16.720] and the trust policy allows the Lambda service[13:16.720 --> 13:20.040] to assume an execution role.[13:20.040 --> 13:23.800] To grant permissions to AWS Lambda to assume a role,[13:23.800 --> 13:27.460] you have to have the permission for IAM pass role action.[13:28.320 --> 13:31.000] A couple of different examples of a relevant policy[13:31.000 --> 13:34.560] for an execution role and the example,[13:34.560 --> 13:37.760] the IAM policy, you know,[13:37.760 --> 13:39.840] basically that we talked about earlier,[13:39.840 --> 13:43.000] would allow you to interact with S3.[13:43.000 --> 13:45.360] Another example would be to make it interact[13:45.360 --> 13:49.240] with CloudWatch logs and to create a log group[13:49.240 --> 13:51.640] and stream those logs.[13:51.640 --> 13:54.800] The trust policy would give Lambda service permissions[13:54.800 --> 13:57.600] to assume a role and invoke a Lambda function[13:57.600 --> 13:58.520] on your behalf.[13:59.560 --> 14:02.600] Now let's talk about the overview of authoring[14:02.600 --> 14:06.120] and configuring Lambda functions.[14:06.120 --> 14:10.440] So really to start with, to create a Lambda function,[14:10.440 --> 14:14.840] you first need to create a Lambda function deployment package,[14:14.840 --> 14:19.800] which is a zip or jar file that consists of your code[14:19.800 --> 14:23.160] and any dependencies with Lambda,[14:23.160 --> 14:25.400] you can use the programming language[14:25.400 --> 14:27.280] and integrated development environment[14:27.280 --> 14:29.800] that you're most familiar with.[14:29.800 --> 14:33.360] And you can actually bring the code you've already written.[14:33.360 --> 14:35.960] And Lambda does support lots of different languages[14:35.960 --> 14:39.520] like Node.js, Python, Ruby, Java, Go,[14:39.520 --> 14:41.160] and.NET runtimes.[14:41.160 --> 14:44.120] And you can also implement a custom runtime[14:44.120 --> 14:45.960] if you wanna use a different language as well,[14:45.960 --> 14:48.480] which is actually pretty cool.[14:48.480 --> 14:50.960] And if you wanna create a Lambda function,[14:50.960 --> 14:52.800] you would specify the handler,[14:52.800 --> 14:55.760] the Lambda function handler is the entry point.[14:55.760 --> 14:57.600] And a few different aspects of it[14:57.600 --> 14:59.400] that are important to pay attention to,[14:59.400 --> 15:00.720] the event object,[15:00.720 --> 15:03.480] this would provide information about the event[15:03.480 --> 15:05.520] that triggered the Lambda function.[15:05.520 --> 15:08.280] And this could be like a predefined object[15:08.280 --> 15:09.760] that AWS service generates.[15:09.760 --> 15:11.520] So you'll see this, like for example,[15:11.520 --> 15:13.440] in the console of AWS,[15:13.440 --> 15:16.360] you can actually ask for these objects[15:16.360 --> 15:19.200] and it'll give you really the JSON structure[15:19.200 --> 15:20.680] so you can test things out.[15:21.880 --> 15:23.900] In the contents of an event object[15:23.900 --> 15:26.800] includes everything you would need to actually invoke it.[15:26.800 --> 15:29.640] The context object is generated by AWS[15:29.640 --> 15:32.360] and this is really a runtime information.[15:32.360 --> 15:35.320] And so if you needed to get some kind of runtime information[15:35.320 --> 15:36.160] about your code,[15:36.160 --> 15:40.400] let's say environmental variables or AWS request ID[15:40.400 --> 15:44.280] or a log stream or remaining time in Millies,[15:45.320 --> 15:47.200] like for example, that one would return[15:47.200 --> 15:48.840] the number of milliseconds that remain[15:48.840 --> 15:50.600] before your function times out,[15:50.600 --> 15:53.300] you can get all that inside the context object.[15:54.520 --> 15:57.560] So what about an example that runs a Python?[15:57.560 --> 15:59.280] Pretty straightforward actually.[15:59.280 --> 16:01.400] All you need is you would put a handler[16:01.400 --> 16:03.280] inside the handler would take,[16:03.280 --> 16:05.000] that it would be a Python function,[16:05.000 --> 16:07.080] it would be an event, there'd be a context,[16:07.080 --> 16:10.960] you pass it inside and then you return some kind of message.[16:10.960 --> 16:13.960] A few different best practices to remember[16:13.960 --> 16:17.240] about AWS Lambda would be to separate[16:17.240 --> 16:20.320] the core business logic from the handler method[16:20.320 --> 16:22.320] and this would make your code more portable,[16:22.320 --> 16:24.280] enable you to target unit tests[16:25.240 --> 16:27.120] without having to worry about the configuration.[16:27.120 --> 16:30.400] So this is always a really good idea just in general.[16:30.400 --> 16:32.680] Make sure you have modular functions.[16:32.680 --> 16:34.320] So you have a single purpose function,[16:34.320 --> 16:37.160] you don't have like a kitchen sink function,[16:37.160 --> 16:40.000] you treat functions as stateless as well.[16:40.000 --> 16:42.800] So you would treat a function that basically[16:42.800 --> 16:46.040] just does one thing and then when it's done,[16:46.040 --> 16:48.320] there is no state that's actually kept anywhere[16:49.320 --> 16:51.120] and also only include what you need.[16:51.120 --> 16:55.840] So you don't want to have a huge sized Lambda functions[16:55.840 --> 16:58.560] and one of the ways that you can avoid this[16:58.560 --> 17:02.360] is by reducing the time it takes a Lambda to unpack[17:02.360 --> 17:04.000] the deployment packages[17:04.000 --> 17:06.600] and you can also minimize the complexity[17:06.600 --> 17:08.640] of your dependencies as well.[17:08.640 --> 17:13.600] And you can also reuse the temporary runtime environment[17:13.600 --> 17:16.080] to improve the performance of a function as well.[17:16.080 --> 17:17.680] And so the temporary runtime environment[17:17.680 --> 17:22.280] initializes any external dependencies of the Lambda code[17:22.280 --> 17:25.760] and you can make sure that any externalized configuration[17:25.760 --> 17:27.920] or dependency that your code retrieves are stored[17:27.920 --> 17:30.640] and referenced locally after the initial run.[17:30.640 --> 17:33.800] So this would be limit re-initializing variables[17:33.800 --> 17:35.960] and objects on every invocation,[17:35.960 --> 17:38.200] keeping it alive and reusing connections[17:38.200 --> 17:40.680] like an HTTP or database[17:40.680 --> 17:43.160] that were established during the previous invocation.[17:43.160 --> 17:45.880] So a really good example of this would be a socket connection.[17:45.880 --> 17:48.040] If you make a socket connection[17:48.040 --> 17:51.640] and this socket connection took two seconds to spawn,[17:51.640 --> 17:54.000] you don't want every time you call Lambda[17:54.000 --> 17:55.480] for it to wait two seconds,[17:55.480 --> 17:58.160] you want to reuse that socket connection.[17:58.160 --> 18:00.600] A few good examples of best practices[18:00.600 --> 18:02.840] would be including logging statements.[18:02.840 --> 18:05.480] This is a kind of a big one[18:05.480 --> 18:08.120] in the case of any cloud computing operation,[18:08.120 --> 18:10.960] especially when it's distributed, if you don't log it,[18:10.960 --> 18:13.280] there's no way you can figure out what's going on.[18:13.280 --> 18:16.560] So you must add logging statements that have context[18:16.560 --> 18:19.720] so you know which particular Lambda instance[18:19.720 --> 18:21.600] is actually occurring in.[18:21.600 --> 18:23.440] Also include results.[18:23.440 --> 18:25.560] So make sure that you know it's happening[18:25.560 --> 18:29.000] when the Lambda ran, use environmental variables as well.[18:29.000 --> 18:31.320] So you can figure out things like what the bucket was[18:31.320 --> 18:32.880] that it was writing to.[18:32.880 --> 18:35.520] And then also don't do recursive code.[18:35.520 --> 18:37.360] That's really a no-no.[18:37.360 --> 18:40.200] You want to write very simple functions with Lambda.[18:41.320 --> 18:44.440] Few different ways to write Lambda actually would be[18:44.440 --> 18:46.280] that you can do the console editor,[18:46.280 --> 18:47.440] which I use all the time.[18:47.440 --> 18:49.320] I like to actually just play around with it.[18:49.320 --> 18:51.640] Now the downside is that if you don't,[18:51.640 --> 18:53.800] if you do need to use custom libraries,[18:53.800 --> 18:56.600] you're not gonna be able to do it other than using,[18:56.600 --> 18:58.440] let's say the AWS SDK.[18:58.440 --> 19:01.600] But for just simple things, it's a great use case.[19:01.600 --> 19:06.080] Another one is you can just upload it to AWS console.[19:06.080 --> 19:09.040] And so you can create a deployment package in an IDE.[19:09.040 --> 19:12.120] Like for example, Visual Studio for.NET,[19:12.120 --> 19:13.280] you can actually just right click[19:13.280 --> 19:16.320] and deploy it directly into Lambda.[19:16.320 --> 19:20.920] Another one is you can upload the entire package into S3[19:20.920 --> 19:22.200] and put it into a bucket.[19:22.200 --> 19:26.280] And then Lambda will just grab it outside of that S3 package.[19:26.280 --> 19:29.760] A few different things to remember about Lambda.[19:29.760 --> 19:32.520] The memory and the timeout are configurations[19:32.520 --> 19:35.840] that determine how the Lambda function performs.[19:35.840 --> 19:38.440] And these will affect the billing.[19:38.440 --> 19:40.200] Now, one of the great things about Lambda[19:40.200 --> 19:43.640] is just amazingly inexpensive to run.[19:43.640 --> 19:45.560] And the reason is that you're charged[19:45.560 --> 19:48.200] based on the number of requests for a function.[19:48.200 --> 19:50.560] A few different things to remember would be the memory.[19:50.560 --> 19:53.560] Like so if you specify more memory,[19:53.560 --> 19:57.120] it's going to increase the cost timeout.[19:57.120 --> 19:59.960] You can also control the memory duration of the function[19:59.960 --> 20:01.720] by having the right kind of timeout.[20:01.720 --> 20:03.960] But if you make the timeout too long,[20:03.960 --> 20:05.880] it could cost you more money.[20:05.880 --> 20:08.520] So really the best practices would be test the performance[20:08.520 --> 20:12.880] of Lambda and make sure you have the optimum memory size.[20:12.880 --> 20:15.160] Also load test it to make sure[20:15.160 --> 20:17.440] that you understand how the timeouts work.[20:17.440 --> 20:18.280] Just in general,[20:18.280 --> 20:21.640] anything with cloud computing, you should load test it.[20:21.640 --> 20:24.200] Now let's talk about an important topic[20:24.200 --> 20:25.280] that's a final topic here,[20:25.280 --> 20:29.080] which is how to deploy Lambda functions.[20:29.080 --> 20:32.200] So versions are immutable copies of a code[20:32.200 --> 20:34.200] in the configuration of your Lambda function.[20:34.200 --> 20:35.880] And the versioning will allow you to publish[20:35.880 --> 20:39.360] one or more versions of your Lambda function.[20:39.360 --> 20:40.400] And as a result,[20:40.400 --> 20:43.360] you can work with different variations of your Lambda function[20:44.560 --> 20:45.840] in your development workflow,[20:45.840 --> 20:48.680] like development, beta, production, et cetera.[20:48.680 --> 20:50.320] And when you create a Lambda function,[20:50.320 --> 20:52.960] there's only one version, the latest version,[20:52.960 --> 20:54.080] dollar sign, latest.[20:54.080 --> 20:57.240] And you can refer to this function using the ARN[20:57.240 --> 20:59.240] or Amazon resource name.[20:59.240 --> 21:00.640] And when you publish a new version,[21:00.640 --> 21:02.920] AWS Lambda will make a snapshot[21:02.920 --> 21:05.320] of the latest version to create a new version.[21:06.800 --> 21:09.600] You can also create an alias for Lambda function.[21:09.600 --> 21:12.280] And conceptually, an alias is just like a pointer[21:12.280 --> 21:13.800] to a specific function.[21:13.800 --> 21:17.040] And you can use that alias in the ARN[21:17.040 --> 21:18.680] to reference the Lambda function version[21:18.680 --> 21:21.280] that's currently associated with the alias.[21:21.280 --> 21:23.400] What's nice about the alias is you can roll back[21:23.400 --> 21:25.840] and forth between different versions,[21:25.840 --> 21:29.760] which is pretty nice because in the case of deploying[21:29.760 --> 21:32.920] a new version, if there's a huge problem with it,[21:32.920 --> 21:34.080] you just toggle it right back.[21:34.080 --> 21:36.400] And there's really not a big issue[21:36.400 --> 21:39.400] in terms of rolling back your code.[21:39.400 --> 21:44.400] Now, let's take a look at an example where AWS S3,[21:45.160 --> 21:46.720] or Amazon S3 is the event source[21:46.720 --> 21:48.560] that invokes your Lambda function.[21:48.560 --> 21:50.720] Every time a new object is created,[21:50.720 --> 21:52.880] when Amazon S3 is the event source,[21:52.880 --> 21:55.800] you can store the information for the event source mapping[21:55.800 --> 21:59.040] in the configuration for the bucket notifications.[21:59.040 --> 22:01.000] And then in that configuration,[22:01.000 --> 22:04.800] you could identify the Lambda function ARN[22:04.800 --> 22:07.160] that Amazon S3 can invoke.[22:07.160 --> 22:08.520] But in some cases,[22:08.520 --> 22:11.680] you're gonna have to update the notification configuration.[22:11.680 --> 22:14.720] So Amazon S3 will invoke the correct version each time[22:14.720 --> 22:17.840] you publish a new version of your Lambda function.[22:17.840 --> 22:21.800] So basically, instead of specifying the function ARN,[22:21.800 --> 22:23.880] you can specify an alias ARN[22:23.880 --> 22:26.320] in the notification of configuration.[22:26.320 --> 22:29.160] And as you promote a new version of the Lambda function[22:29.160 --> 22:32.200] into production, you only need to update the prod alias[22:32.200 --> 22:34.520] to point to the latest stable version.[22:34.520 --> 22:36.320] And you also don't need to update[22:36.320 --> 22:39.120] the notification configuration in Amazon S3.[22:40.480 --> 22:43.080] And when you build serverless applications[22:43.080 --> 22:46.600] as common to have code that's shared across Lambda functions,[22:46.600 --> 22:49.400] it could be custom code, it could be a standard library,[22:49.400 --> 22:50.560] et cetera.[22:50.560 --> 22:53.320] And before, and this was really a big limitation,[22:53.320 --> 22:55.920] was you had to have all the code deployed together.[22:55.920 --> 22:58.960] But now, one of the really cool things you can do[22:58.960 --> 23:00.880] is you can have a Lambda function[23:00.880 --> 23:03.600] to include additional code as a layer.[23:03.600 --> 23:05.520] So layer is basically a zip archive[23:05.520 --> 23:08.640] that contains a library, maybe a custom runtime.[23:08.640 --> 23:11.720] Maybe it isn't gonna include some kind of really cool[23:11.720 --> 23:13.040] pre-trained model.[23:13.040 --> 23:14.680] And then the layers you can use,[23:14.680 --> 23:15.800] the libraries in your function[23:15.800 --> 23:18.960] without needing to include them in your deployment package.[23:18.960 --> 23:22.400] And it's a best practice to have the smaller deployment packages[23:22.400 --> 23:25.240] and share common dependencies with the layers.[23:26.120 --> 23:28.520] Also layers will help you keep your deployment package[23:28.520 --> 23:29.360] really small.[23:29.360 --> 23:32.680] So for node, JS, Python, Ruby functions,[23:32.680 --> 23:36.000] you can develop your function code in the console[23:36.000 --> 23:39.000] as long as you keep the package under three megabytes.[23:39.000 --> 23:42.320] And then a function can use up to five layers at a time,[23:42.320 --> 23:44.160] which is pretty incredible actually,[23:44.160 --> 23:46.040] which means that you could have, you know,[23:46.040 --> 23:49.240] basically up to a 250 megabytes total.[23:49.240 --> 23:53.920] So for many languages, this is plenty of space.[23:53.920 --> 23:56.620] Also Amazon has published a public layer[23:56.620 --> 23:58.800] that includes really popular libraries[23:58.800 --> 24:00.800] like NumPy and SciPy,[24:00.800 --> 24:04.840] which does dramatically help data processing[24:04.840 --> 24:05.680] in machine learning.[24:05.680 --> 24:07.680] Now, if I had to predict the future[24:07.680 --> 24:11.840] and I wanted to predict a massive announcement,[24:11.840 --> 24:14.840] I would say that what AWS could do[24:14.840 --> 24:18.600] is they could have a GPU enabled layer at some point[24:18.600 --> 24:20.160] that would include pre-trained models.[24:20.160 --> 24:22.120] And if they did something like that,[24:22.120 --> 24:24.320] that could really open up the doors[24:24.320 --> 24:27.000] for the pre-trained model revolution.[24:27.000 --> 24:30.160] And I would bet that that's possible.[24:30.160 --> 24:32.200] All right, well, in a nutshell,[24:32.200 --> 24:34.680] AWS Lambda is one of my favorite services.[24:34.680 --> 24:38.440] And I think it's worth everybody's time[24:38.440 --> 24:42.360] that's interested in AWS to play around with AWS Lambda.[24:42.360 --> 24:47.200] All right, next week, I'm going to cover API Gateway.[24:47.200 --> 25:13.840] All right, see you next week.If you enjoyed this video, here are additional resources to look at:Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scalePython, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-dukeAWS Certified Solutions Architect - Professional (SAP-C01) Cert Prep: 1 Design for Organizational Complexity:https://www.linkedin.com/learning/aws-certified-solutions-architect-professional-sap-c01-cert-prep-1-design-for-organizational-complexity/design-for-organizational-complexity?autoplay=trueEssentials of MLOps with Azure and Databricks: https://www.linkedin.com/learning/essentials-of-mlops-with-azure-1-introduction/essentials-of-mlops-with-azureO'Reilly Book: Implementing MLOps in the EnterpriseO'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/O'Reilly Book: Developing on AWS with C#: A Comprehensive Guide on Using C# to Build Solutions on the AWS Platformhttps://www.amazon.com/Developing-AWS-Comprehensive-Solutions-Platform/dp/1492095877Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZPragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5QSubscribe to 52 Weeks of AWS Podcast: https://52-weeks-of-cloud.simplecast.comView content on noahgift.com: https://noahgift.com/View content on Pragmatic AI Labs Website: https://paiml.com/
If you enjoyed this video, here are additional resources to look at:Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scalePython, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-dukeAWS Certified Solutions Architect - Professional (SAP-C01) Cert Prep: 1 Design for Organizational Complexity:https://www.linkedin.com/learning/aws-certified-solutions-architect-professional-sap-c01-cert-prep-1-design-for-organizational-complexity/design-for-organizational-complexity?autoplay=trueO'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/O'Reilly Book: Developing on AWS with C#: A Comprehensive Guide on Using C# to Build Solutions on the AWS Platformhttps://www.amazon.com/Developing-AWS-Comprehensive-Solutions-Platform/dp/1492095877Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZPragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5QSubscribe to 52 Weeks of AWS Podcast: https://52-weeks-of-cloud.simplecast.comView content on noahgift.com: https://noahgift.com/View content on Pragmatic AI Labs Website: https://paiml.com/[00:00.000 --> 00:02.260] Hey, three, two, one, there we go, we're live.[00:02.260 --> 00:07.260] All right, so welcome Simon to Enterprise ML Ops interviews.[00:09.760 --> 00:13.480] The goal of these interviews is to get people exposed[00:13.480 --> 00:17.680] to real professionals who are doing work in ML Ops.[00:17.680 --> 00:20.360] It's such a cutting edge field[00:20.360 --> 00:22.760] that I think a lot of people are very curious about.[00:22.760 --> 00:23.600] What is it?[00:23.600 --> 00:24.960] You know, how do you do it?[00:24.960 --> 00:27.760] And very honored to have Simon here.[00:27.760 --> 00:29.200] And do you wanna introduce yourself[00:29.200 --> 00:31.520] and maybe talk a little bit about your background?[00:31.520 --> 00:32.360] Sure.[00:32.360 --> 00:33.960] Yeah, thanks again for inviting me.[00:34.960 --> 00:38.160] My name is Simon Stebelena or Simon.[00:38.160 --> 00:40.440] I am originally from Austria,[00:40.440 --> 00:43.120] but currently working in the Netherlands and Amsterdam[00:43.120 --> 00:46.080] at Transaction Monitoring Netherlands.[00:46.080 --> 00:48.780] Here I am the lead ML Ops engineer.[00:49.840 --> 00:51.680] What are we doing at TML actually?[00:51.680 --> 00:55.560] We are a data processing company actually.[00:55.560 --> 00:59.320] We are owned by the five large banks of Netherlands.[00:59.320 --> 01:02.080] And our purpose is kind of what the name says.[01:02.080 --> 01:05.920] We are basically lifting specifically anti money laundering.[01:05.920 --> 01:08.040] So anti money laundering models that run[01:08.040 --> 01:11.440] on a personalized transactions of businesses[01:11.440 --> 01:13.240] we get from these five banks[01:13.240 --> 01:15.760] to detect unusual patterns on that transaction graph[01:15.760 --> 01:19.000] that might indicate money laundering.[01:19.000 --> 01:20.520] That's a natural what we do.[01:20.520 --> 01:21.800] So as you can imagine,[01:21.800 --> 01:24.160] we are really focused on building models[01:24.160 --> 01:27.280] and obviously ML Ops is a big component there[01:27.280 --> 01:29.920] because that is really the core of what you do.[01:29.920 --> 01:32.680] You wanna do it efficiently and effectively as well.[01:32.680 --> 01:34.760] In my role as lead ML Ops engineer,[01:34.760 --> 01:36.880] I'm on the one hand the lead engineer[01:36.880 --> 01:38.680] of the actual ML Ops platform team.[01:38.680 --> 01:40.200] So this is actually a centralized team[01:40.200 --> 01:42.680] that builds out lots of the infrastructure[01:42.680 --> 01:47.320] that's needed to do modeling effectively and efficiently.[01:47.320 --> 01:50.360] But also I am the craft lead[01:50.360 --> 01:52.640] for the machine learning engineering craft.[01:52.640 --> 01:55.120] These are actually in our case, the machine learning engineers,[01:55.120 --> 01:58.360] the people working within the model development teams[01:58.360 --> 01:59.360] and cross functional teams[01:59.360 --> 02:01.680] actually building these models.[02:01.680 --> 02:03.640] That's what I'm currently doing[02:03.640 --> 02:05.760] during the evenings and weekends.[02:05.760 --> 02:09.400] I'm also lecturer at the University of Applied Sciences, Vienna.[02:09.400 --> 02:12.080] And there I'm teaching data mining[02:12.080 --> 02:15.160] and data warehousing to master students, essentially.[02:16.240 --> 02:19.080] Before TMNL, I was at bold.com,[02:19.080 --> 02:21.960] which is the largest eCommerce retailer in the Netherlands.[02:21.960 --> 02:25.040] So I always tend to see the Amazon of the Netherlands[02:25.040 --> 02:27.560] or been a lux actually.[02:27.560 --> 02:30.920] It is still the biggest eCommerce retailer in the Netherlands[02:30.920 --> 02:32.960] even before Amazon actually.[02:32.960 --> 02:36.160] And there I was an expert machine learning engineer.[02:36.160 --> 02:39.240] So doing somewhat comparable stuff,[02:39.240 --> 02:42.440] a bit more still focused on the actual modeling part.[02:42.440 --> 02:44.800] Now it's really more on the infrastructure end.[02:45.760 --> 02:46.760] And well, before that,[02:46.760 --> 02:49.360] I spent some time in consulting, leading a data science team.[02:49.360 --> 02:50.880] That's actually where I kind of come from.[02:50.880 --> 02:53.360] I really come from originally the data science end.[02:54.640 --> 02:57.840] And there I kind of started drifting towards ML Ops[02:57.840 --> 02:59.200] because we started building out[02:59.200 --> 03:01.640] a deployment and serving platform[03:01.640 --> 03:04.440] that would as consulting company would make it easier[03:04.440 --> 03:07.920] for us to deploy models for our clients[03:07.920 --> 03:10.840] to serve these models, to also monitor these models.[03:10.840 --> 03:12.800] And that kind of then made me drift further and further[03:12.800 --> 03:15.520] down the engineering lane all the way to ML Ops.[03:17.000 --> 03:19.600] Great, yeah, that's a great background.[03:19.600 --> 03:23.200] I'm kind of curious in terms of the data science[03:23.200 --> 03:25.240] to ML Ops journey,[03:25.240 --> 03:27.720] that I think would be a great discussion[03:27.720 --> 03:29.080] to dig into a little bit.[03:30.280 --> 03:34.320] My background is originally more on the software engineering[03:34.320 --> 03:36.920] side and when I was in the Bay Area,[03:36.920 --> 03:41.160] I did individual contributor and then ran companies[03:41.160 --> 03:44.240] at one point and ran multiple teams.[03:44.240 --> 03:49.240] And then as the data science field exploded,[03:49.240 --> 03:52.880] I hired multiple data science teams and worked with them.[03:52.880 --> 03:55.800] But what was interesting is that I found that[03:56.840 --> 03:59.520] I think the original approach of data science[03:59.520 --> 04:02.520] from my perspective was lacking[04:02.520 --> 04:07.240] in that there wasn't really like deliverables.[04:07.240 --> 04:10.520] And I think when you look at a software engineering team,[04:10.520 --> 04:12.240] it's very clear there's deliverables.[04:12.240 --> 04:14.800] Like you have a mobile app and it has to get better[04:14.800 --> 04:15.880] each week, right?[04:15.880 --> 04:18.200] Where else, what are you doing?[04:18.200 --> 04:20.880] And so I would love to hear your story[04:20.880 --> 04:25.120] about how you went from doing kind of more pure data science[04:25.120 --> 04:27.960] to now it sounds like ML Ops.[04:27.960 --> 04:30.240] Yeah, yeah, actually.[04:30.240 --> 04:33.800] So back then in consulting one of the,[04:33.800 --> 04:36.200] which was still at least back then in Austria,[04:36.200 --> 04:39.280] data science and everything around it was still kind of[04:39.280 --> 04:43.720] in this infancy back then 2016 and so on.[04:43.720 --> 04:46.560] It was still really, really new to many organizations,[04:46.560 --> 04:47.400] at least in Austria.[04:47.400 --> 04:50.120] There might be some years behind in the US and stuff.[04:50.120 --> 04:52.040] But back then it was still relatively fresh.[04:52.040 --> 04:55.240] So in consulting, what we very often struggled with was[04:55.240 --> 04:58.520] on the modeling end, problems could be solved,[04:58.520 --> 05:02.040] but actually then easy deployment,[05:02.040 --> 05:05.600] keeping these models in production at client side.[05:05.600 --> 05:08.880] That was always a bit more of the challenge.[05:08.880 --> 05:12.400] And so naturally kind of I started thinking[05:12.400 --> 05:16.200] and focusing more on the actual bigger problem that I saw,[05:16.200 --> 05:19.440] which was not so much building the models,[05:19.440 --> 05:23.080] but it was really more, how can we streamline things?[05:23.080 --> 05:24.800] How can we keep things operating?[05:24.800 --> 05:27.960] How can we make that move easier from a prototype,[05:27.960 --> 05:30.680] from a PUC to a productionized model?[05:30.680 --> 05:33.160] Also how can we keep it there and maintain it there?[05:33.160 --> 05:35.480] So personally I was really more,[05:35.480 --> 05:37.680] I saw that this problem was coming up[05:38.960 --> 05:40.320] and that really fascinated me.[05:40.320 --> 05:44.120] So I started jumping more on that exciting problem.[05:44.120 --> 05:45.080] That's how it went for me.[05:45.080 --> 05:47.000] And back then we then also recognized it[05:47.000 --> 05:51.560] as a potential product in our case.[05:51.560 --> 05:54.120] So we started building out that deployment[05:54.120 --> 05:56.960] and serving and monitoring platform, actually.[05:56.960 --> 05:59.520] And that then really for me, naturally,[05:59.520 --> 06:01.840] I fell into that rabbit hole[06:01.840 --> 06:04.280] and I also never wanted to get out of it again.[06:05.680 --> 06:09.400] So the system that you built initially,[06:09.400 --> 06:10.840] what was your stack?[06:10.840 --> 06:13.760] What were some of the things you were using?[06:13.760 --> 06:17.000] Yeah, so essentially we had,[06:17.000 --> 06:19.560] when we talk about the stack on the backend,[06:19.560 --> 06:20.560] there was a lot of,[06:20.560 --> 06:23.000] so the full backend was written in Java.[06:23.000 --> 06:25.560] We were using more from a user perspective,[06:25.560 --> 06:28.040] the contract that we kind of had,[06:28.040 --> 06:32.560] our goal was to build a drag and drop platform for models.[06:32.560 --> 06:35.760] So basically the contract was you package your model[06:35.760 --> 06:37.960] as an MLflow model,[06:37.960 --> 06:41.520] and then you basically drag and drop it into a web UI.[06:41.520 --> 06:43.640] It's gonna be wrapped in containers.[06:43.640 --> 06:45.040] It's gonna be deployed.[06:45.040 --> 06:45.880] It's gonna be,[06:45.880 --> 06:49.680] there will be a monitoring layer in front of it[06:49.680 --> 06:52.760] based on whatever the dataset is you trained it on.[06:52.760 --> 06:55.920] You would automatically calculate different metrics,[06:55.920 --> 06:57.360] different distributional metrics[06:57.360 --> 06:59.240] around your variables that you are using.[06:59.240 --> 07:02.080] And so we were layering this approach[07:02.080 --> 07:06.840] to, so that eventually every incoming request would be,[07:06.840 --> 07:08.160] you would have a nice dashboard.[07:08.160 --> 07:10.040] You could monitor all that stuff.[07:10.040 --> 07:12.600] So stackwise it was actually MLflow.[07:12.600 --> 07:15.480] Specifically MLflow models a lot.[07:15.480 --> 07:17.920] Then it was Java in the backend, Python.[07:17.920 --> 07:19.760] There was a lot of Python,[07:19.760 --> 07:22.040] especially PySpark component as well.[07:23.000 --> 07:25.880] There was a, it's been quite a while actually,[07:25.880 --> 07:29.160] there was a quite some part written in Scala.[07:29.160 --> 07:32.280] Also, because there was a component of this platform[07:32.280 --> 07:34.800] was also a bit of an auto ML approach,[07:34.800 --> 07:36.480] but that died then over time.[07:36.480 --> 07:40.120] And that was also based on PySpark[07:40.120 --> 07:43.280] and vanilla Spark written in Scala.[07:43.280 --> 07:45.560] So we could facilitate the auto ML part.[07:45.560 --> 07:48.600] And then later on we actually added that deployment,[07:48.600 --> 07:51.480] the easy deployment and serving part.[07:51.480 --> 07:55.280] So that was kind of, yeah, a lot of custom build stuff.[07:55.280 --> 07:56.120] Back then, right?[07:56.120 --> 07:59.720] There wasn't that much MLOps tooling out there yet.[07:59.720 --> 08:02.920] So you need to build a lot of that stuff custom.[08:02.920 --> 08:05.280] So it was largely custom built.[08:05.280 --> 08:09.280] Yeah, the MLflow concept is an interesting concept[08:09.280 --> 08:13.880] because they provide this package structure[08:13.880 --> 08:17.520] that at least you have some idea of,[08:17.520 --> 08:19.920] what is gonna be sent into the model[08:19.920 --> 08:22.680] and like there's a format for the model.[08:22.680 --> 08:24.720] And I think that part of MLflow[08:24.720 --> 08:27.520] seems to be a pretty good idea,[08:27.520 --> 08:30.080] which is you're creating a standard where,[08:30.080 --> 08:32.360] you know, if in the case of,[08:32.360 --> 08:34.720] if you're using scikit learn or something,[08:34.720 --> 08:37.960] you don't necessarily want to just throw[08:37.960 --> 08:40.560] like a pickled model somewhere and just say,[08:40.560 --> 08:42.720] okay, you know, let's go.[08:42.720 --> 08:44.760] Yeah, that was also our thinking back then.[08:44.760 --> 08:48.040] So we thought a lot about what would be a,[08:48.040 --> 08:51.720] what would be, what could become the standard actually[08:51.720 --> 08:53.920] for how you package models.[08:53.920 --> 08:56.200] And back then MLflow was one of the little tools[08:56.200 --> 08:58.160] that was already there, already existent.[08:58.160 --> 09:00.360] And of course there was data bricks behind it.[09:00.360 --> 09:02.680] So we also made a bet on that back then and said,[09:02.680 --> 09:04.920] all right, let's follow that packaging standard[09:04.920 --> 09:08.680] and make it the contract how you would as a data scientist,[09:08.680 --> 09:10.800] then how you would need to package it up[09:10.800 --> 09:13.640] and submit it to the platform.[09:13.640 --> 09:16.800] Yeah, it's interesting because the,[09:16.800 --> 09:19.560] one of the, this reminds me of one of the issues[09:19.560 --> 09:21.800] that's happening right now with cloud computing,[09:21.800 --> 09:26.800] where in the cloud AWS has dominated for a long time[09:29.480 --> 09:34.480] and they have 40% market share, I think globally.[09:34.480 --> 09:38.960] And Azure's now gaining and they have some pretty good traction[09:38.960 --> 09:43.120] and then GCP's been down for a bit, you know,[09:43.120 --> 09:45.760] in that maybe the 10% range or something like that.[09:45.760 --> 09:47.760] But what's interesting is that it seems like[09:47.760 --> 09:51.480] in the case of all of the cloud providers,[09:51.480 --> 09:54.360] they haven't necessarily been leading the way[09:54.360 --> 09:57.840] on things like packaging models, right?[09:57.840 --> 10:01.480] Or, you know, they have their own proprietary systems[10:01.480 --> 10:06.480] which have been developed and are continuing to be developed[10:06.640 --> 10:08.920] like Vertex AI in the case of Google,[10:09.760 --> 10:13.160] the SageMaker in the case of Amazon.[10:13.160 --> 10:16.480] But what's interesting is, let's just take SageMaker,[10:16.480 --> 10:20.920] for example, there isn't really like this, you know,[10:20.920 --> 10:25.480] industry wide standard of model packaging[10:25.480 --> 10:28.680] that SageMaker uses, they have their own proprietary stuff[10:28.680 --> 10:31.040] that kind of builds in and Vertex AI[10:31.040 --> 10:32.440] has their own proprietary stuff.[10:32.440 --> 10:34.920] So, you know, I think it is interesting[10:34.920 --> 10:36.960] to see what's gonna happen[10:36.960 --> 10:41.120] because I think your original hypothesis which is,[10:41.120 --> 10:44.960] let's pick, you know, this looks like it's got some traction[10:44.960 --> 10:48.760] and it wasn't necessarily tied directly to a cloud provider[10:48.760 --> 10:51.600] because Databricks can work on anything.[10:51.600 --> 10:53.680] It seems like that in particular,[10:53.680 --> 10:56.800] that's one of the more sticky problems right now[10:56.800 --> 11:01.800] with MLopsis is, you know, who's the leader?[11:02.280 --> 11:05.440] Like, who's developing the right, you know,[11:05.440 --> 11:08.880] kind of a standard for tooling.[11:08.880 --> 11:12.320] And I don't know, maybe that leads into kind of you talking[11:12.320 --> 11:13.760] a little bit about what you're doing currently.[11:13.760 --> 11:15.600] Like, do you have any thoughts about the, you know,[11:15.600 --> 11:18.720] current tooling and what you're doing at your current company[11:18.720 --> 11:20.920] and what's going on with that?[11:20.920 --> 11:21.760] Absolutely.[11:21.760 --> 11:24.200] So at my current organization,[11:24.200 --> 11:26.040] Transaction Monitor Netherlands,[11:26.040 --> 11:27.480] we are fully on AWS.[11:27.480 --> 11:32.000] So we're really almost cloud native AWS.[11:32.000 --> 11:34.840] And so that also means everything we do on the modeling side[11:34.840 --> 11:36.600] really evolves around SageMaker.[11:37.680 --> 11:40.840] So for us, specifically for us as MLops team,[11:40.840 --> 11:44.680] we are building the platform around SageMaker capabilities.[11:45.680 --> 11:48.360] And on that end, at least company internal,[11:48.360 --> 11:52.880] we have a contract how you must actually deploy models.[11:52.880 --> 11:56.200] There is only one way, what we call the golden path,[11:56.200 --> 11:59.800] in that case, this is the streamlined highly automated path[11:59.800 --> 12:01.360] that is supported by the platform.[12:01.360 --> 12:04.360] This is the only way how you can actually deploy models.[12:04.360 --> 12:09.360] And in our case, that is actually a SageMaker pipeline object.[12:09.640 --> 12:12.680] So in our company, we're doing large scale batch processing.[12:12.680 --> 12:15.040] So we're actually not doing anything real time at present.[12:15.040 --> 12:17.040] We are doing post transaction monitoring.[12:17.040 --> 12:20.960] So that means you need to submit essentially DAX, right?[12:20.960 --> 12:23.400] This is what we use for training.[12:23.400 --> 12:25.680] This is what we also deploy eventually.[12:25.680 --> 12:27.720] And this is our internal contract.[12:27.720 --> 12:32.200] You need to provision a SageMaker in your model repository.[12:32.200 --> 12:34.640] You got to have one place,[12:34.640 --> 12:37.840] and there must be a function with a specific name[12:37.840 --> 12:41.440] and that function must return a SageMaker pipeline object.[12:41.440 --> 12:44.920] So this is our internal contract actually.[12:44.920 --> 12:46.600] Yeah, that's interesting.[12:46.600 --> 12:51.200] I mean, and I could see like for, I know many people[12:51.200 --> 12:53.880] that are using SageMaker in production,[12:53.880 --> 12:58.680] and it does seem like where it has some advantages[12:58.680 --> 13:02.360] is that AWS generally does a pretty good job[13:02.360 --> 13:04.240] at building solutions.[13:04.240 --> 13:06.920] And if you just look at the history of services,[13:06.920 --> 13:09.080] the odds are pretty high[13:09.080 --> 13:12.880] that they'll keep getting better, keep improving things.[13:12.880 --> 13:17.080] And it seems like what I'm hearing from people,[13:17.080 --> 13:19.080] and it sounds like maybe with your organization as well,[13:19.080 --> 13:24.080] is that potentially the SDK for SageMaker[13:24.440 --> 13:29.120] is really the win versus some of the UX tools they have[13:29.120 --> 13:32.680] and the interface for Canvas and Studio.[13:32.680 --> 13:36.080] Is that what's happening?[13:36.080 --> 13:38.720] Yeah, so I think, right,[13:38.720 --> 13:41.440] what we try to do is we always try to think about our users.[13:41.440 --> 13:44.880] So how do our users, who are our users?[13:44.880 --> 13:47.000] What capabilities and skills do they have?[13:47.000 --> 13:50.080] And what freedom should they have[13:50.080 --> 13:52.640] and what abilities should they have to develop models?[13:52.640 --> 13:55.440] In our case, we don't really have use cases[13:55.440 --> 13:58.640] for stuff like Canvas because our users[13:58.640 --> 14:02.680] are fairly mature teams that know how to do their,[14:02.680 --> 14:04.320] on the one hand, the data science stuff, of course,[14:04.320 --> 14:06.400] but also the engineering stuff.[14:06.400 --> 14:08.160] So in our case, things like Canvas[14:08.160 --> 14:10.320] do not really play so much role[14:10.320 --> 14:12.960] because obviously due to the high abstraction layer[14:12.960 --> 14:15.640] of more like graphical user interfaces,[14:15.640 --> 14:17.360] drag and drop tooling,[14:17.360 --> 14:20.360] you are also limited in what you can do,[14:20.360 --> 14:22.480] or what you can do easily.[14:22.480 --> 14:26.320] So in our case, really, it is the strength of the flexibility[14:26.320 --> 14:28.320] that the SageMaker SDK gives you.[14:28.320 --> 14:33.040] And in general, the SDK around most AWS services.[14:34.080 --> 14:36.760] But also it comes with challenges, of course.[14:37.720 --> 14:38.960] You give a lot of freedom,[14:38.960 --> 14:43.400] but also you're creating a certain ask,[14:43.400 --> 14:47.320] certain requirements for your model development teams,[14:47.320 --> 14:49.600] which is also why we've also been working[14:49.600 --> 14:52.600] about abstracting further away from the SDK.[14:52.600 --> 14:54.600] So our objective is actually[14:54.600 --> 14:58.760] that you should not be forced to interact with the raw SDK[14:58.760 --> 15:00.600] when you use SageMaker anymore,[15:00.600 --> 15:03.520] but you have a thin layer of abstraction[15:03.520 --> 15:05.480] on top of what you are doing.[15:05.480 --> 15:07.480] That's actually something we are moving towards[15:07.480 --> 15:09.320] more and more as well.[15:09.320 --> 15:11.120] Because yeah, it gives you the flexibility,[15:11.120 --> 15:12.960] but also flexibility comes at a cost,[15:12.960 --> 15:15.080] comes often at the cost of speeds,[15:15.080 --> 15:18.560] specifically when it comes to the 90% default stuff[15:18.560 --> 15:20.720] that you want to do, yeah.[15:20.720 --> 15:24.160] And one of the things that I have as a complaint[15:24.160 --> 15:29.160] against SageMaker is that it only uses virtual machines,[15:30.000 --> 15:35.000] and it does seem like a strange strategy in some sense.[15:35.000 --> 15:40.000] Like for example, I guess if you're doing batch only,[15:40.000 --> 15:42.000] it doesn't matter as much,[15:42.000 --> 15:45.000] which I think is a good strategy actually[15:45.000 --> 15:50.000] to get your batch based predictions very, very strong.[15:50.000 --> 15:53.000] And in that case, maybe the virtual machines[15:53.000 --> 15:56.000] make a little bit less of a complaint.[15:56.000 --> 16:00.000] But in the case of the endpoints with SageMaker,[16:00.000 --> 16:02.000] the fact that you have to spend up[16:02.000 --> 16:04.000] these really expensive virtual machines[16:04.000 --> 16:08.000] and let them run 24 seven to do online prediction,[16:08.000 --> 16:11.000] is that something that your organization evaluated[16:11.000 --> 16:13.000] and decided not to use?[16:13.000 --> 16:15.000] Or like, what are your thoughts behind that?[16:15.000 --> 16:19.000] Yeah, in our case, doing real time[16:19.000 --> 16:22.000] or near real time inference is currently not really relevant[16:22.000 --> 16:25.000] for the simple reason that when you think a bit more[16:25.000 --> 16:28.000] about the money laundering or anti money laundering space,[16:28.000 --> 16:31.000] typically when, right,[16:31.000 --> 16:34.000] all every individual bank must do anti money laundering[16:34.000 --> 16:37.000] and they have armies of people doing that.[16:37.000 --> 16:39.000] But on the other hand,[16:39.000 --> 16:43.000] the time it actually takes from one of their systems,[16:43.000 --> 16:46.000] one of their AML systems actually detecting something[16:46.000 --> 16:49.000] that's unusual that then goes into a review process[16:49.000 --> 16:54.000] until it eventually hits the governmental institution[16:54.000 --> 16:56.000] that then takes care of the cases that have been[16:56.000 --> 16:58.000] at least twice validated that they are indeed,[16:58.000 --> 17:01.000] they look very unusual.[17:01.000 --> 17:04.000] So this takes a while, this can take quite some time,[17:04.000 --> 17:06.000] which is also why it doesn't really matter[17:06.000 --> 17:09.000] whether you ship your prediction within a second[17:09.000 --> 17:13.000] or whether it takes you a week or two weeks.[17:13.000 --> 17:15.000] It doesn't really matter, hence for us,[17:15.000 --> 17:19.000] that problem so far thinking about real time inference[17:19.000 --> 17:21.000] has not been there.[17:21.000 --> 17:25.000] But yeah, indeed, for other use cases,[17:25.000 --> 17:27.000] for also private projects,[17:27.000 --> 17:29.000] we've also been considering SageMaker Endpoints[17:29.000 --> 17:31.000] for a while, but exactly what you said,[17:31.000 --> 17:33.000] the fact that you need to have a very beefy machine[17:33.000 --> 17:35.000] running all the time,[17:35.000 --> 17:39.000] specifically when you have heavy GPU loads, right,[17:39.000 --> 17:43.000] and you're actually paying for that machine running 2047,[17:43.000 --> 17:46.000] although you do have quite fluctuating load.[17:46.000 --> 17:49.000] Yeah, then that definitely becomes quite a consideration[17:49.000 --> 17:51.000] of what you go for.[17:51.000 --> 17:58.000] Yeah, and I actually have been talking to AWS about that,[17:58.000 --> 18:02.000] because one of the issues that I have is that[18:02.000 --> 18:07.000] the AWS platform really pushes serverless,[18:07.000 --> 18:10.000] and then my question for AWS is,[18:10.000 --> 18:13.000] so why aren't you using it?[18:13.000 --> 18:16.000] I mean, if you're pushing serverless for everything,[18:16.000 --> 18:19.000] why is SageMaker nothing serverless?[18:19.000 --> 18:21.000] And so maybe they're going to do that, I don't know.[18:21.000 --> 18:23.000] I don't have any inside information,[18:23.000 --> 18:29.000] but it is interesting to hear you had some similar concerns.[18:29.000 --> 18:32.000] I know that there's two questions here.[18:32.000 --> 18:37.000] One is someone asked about what do you do for data versioning,[18:37.000 --> 18:41.000] and a second one is how do you do event based MLOps?[18:41.000 --> 18:43.000] So maybe kind of following up.[18:43.000 --> 18:46.000] Yeah, what do we do for data versioning?[18:46.000 --> 18:51.000] On the one hand, we're running a data lakehouse,[18:51.000 --> 18:54.000] where after data we get from the financial institutions,[18:54.000 --> 18:57.000] from the banks that runs through massive data pipeline,[18:57.000 --> 19:01.000] also on AWS, we're using glue and step functions actually for that,[19:01.000 --> 19:03.000] and then eventually it ends up modeled to some extent,[19:03.000 --> 19:06.000] sanitized, quality checked in our data lakehouse,[19:06.000 --> 19:10.000] and there we're actually using hoodie on top of S3.[19:10.000 --> 19:13.000] And this is also what we use for versioning,[19:13.000 --> 19:16.000] which we use for time travel and all these things.[19:16.000 --> 19:19.000] So that is hoodie on top of S3,[19:19.000 --> 19:21.000] when then pipelines,[19:21.000 --> 19:24.000] so actually our model pipelines plug in there[19:24.000 --> 19:27.000] and spit out predictions, alerts,[19:27.000 --> 19:29.000] what we call alerts eventually.[19:29.000 --> 19:33.000] That is something that we version based on unique IDs.[19:33.000 --> 19:36.000] So processing IDs, we track pretty much everything,[19:36.000 --> 19:39.000] every line of code that touched,[19:39.000 --> 19:43.000] is related to a specific row in our data.[19:43.000 --> 19:46.000] So we can exactly track back for every single row[19:46.000 --> 19:48.000] in our predictions and in our alerts,[19:48.000 --> 19:50.000] what pipeline ran on it,[19:50.000 --> 19:52.000] which jobs were in that pipeline,[19:52.000 --> 19:56.000] which code exactly was running in each job,[19:56.000 --> 19:58.000] which intermediate results were produced.[19:58.000 --> 20:01.000] So we're basically adding lineage information[20:01.000 --> 20:03.000] to everything we output along that line,[20:03.000 --> 20:05.000] so we can track everything back[20:05.000 --> 20:09.000] using a few tools we've built.[20:09.000 --> 20:12.000] So the tool you mentioned,[20:12.000 --> 20:13.000] I'm not familiar with it.[20:13.000 --> 20:14.000] What is it called again?[20:14.000 --> 20:15.000] It's called hoodie?[20:15.000 --> 20:16.000] Hoodie.[20:16.000 --> 20:17.000] Hoodie.[20:17.000 --> 20:18.000] Oh, what is it?[20:18.000 --> 20:19.000] Maybe you can describe it.[20:19.000 --> 20:22.000] Yeah, hoodie is essentially,[20:22.000 --> 20:29.000] it's quite similar to other tools such as[20:29.000 --> 20:31.000] Databricks, how is it called?[20:31.000 --> 20:32.000] Databricks?[20:32.000 --> 20:33.000] Delta Lake maybe?[20:33.000 --> 20:34.000] Yes, exactly.[20:34.000 --> 20:35.000] Exactly.[20:35.000 --> 20:38.000] It's basically, it's equivalent to Delta Lake,[20:38.000 --> 20:40.000] just back then when we looked into[20:40.000 --> 20:42.000] what are we going to use.[20:42.000 --> 20:44.000] Delta Lake was not open sourced yet.[20:44.000 --> 20:46.000] Databricks open sourced a while ago.[20:46.000 --> 20:47.000] We went for Hoodie.[20:47.000 --> 20:50.000] It essentially, it is a layer on top of,[20:50.000 --> 20:53.000] in our case, S3 that allows you[20:53.000 --> 20:58.000] to more easily keep track of what you,[20:58.000 --> 21:03.000] of the actions you are performing on your data.[21:03.000 --> 21:08.000] So it's essentially very similar to Delta Lake,[21:08.000 --> 21:13.000] just already before an open sourced solution.[21:13.000 --> 21:15.000] Yeah, that's, I didn't know anything about that.[21:15.000 --> 21:16.000] So now I do.[21:16.000 --> 21:19.000] So thanks for letting me know.[21:19.000 --> 21:21.000] I'll have to look into that.[21:21.000 --> 21:27.000] The other, I guess, interesting stack related question is,[21:27.000 --> 21:29.000] what are your thoughts about,[21:29.000 --> 21:32.000] I think there's two areas that I think[21:32.000 --> 21:34.000] are interesting and that are emerging.[21:34.000 --> 21:36.000] Oh, actually there's, there's multiple.[21:36.000 --> 21:37.000] Maybe I'll just bring them all up.[21:37.000 --> 21:39.000] So we'll do one by one.[21:39.000 --> 21:42.000] So these are some emerging areas that I'm, that I'm seeing.[21:42.000 --> 21:49.000] So one is the concept of event driven, you know,[21:49.000 --> 21:54.000] architecture versus, versus maybe like a static architecture.[21:54.000 --> 21:57.000] And so I think obviously you're using step functions.[21:57.000 --> 22:00.000] So you're a fan of, of event driven architecture.[22:00.000 --> 22:04.000] Maybe we start, we'll start with that one is what are your,[22:04.000 --> 22:08.000] what are your thoughts on going more event driven in your organization?[22:08.000 --> 22:09.000] Yeah.[22:09.000 --> 22:13.000] In, in, in our case, essentially everything works event driven.[22:13.000 --> 22:14.000] Right.[22:14.000 --> 22:19.000] So since we on AWS, we're using event bridge or cloud watch events.[22:19.000 --> 22:21.000] I think now it's called everywhere.[22:21.000 --> 22:22.000] Right.[22:22.000 --> 22:24.000] This is how we trigger pretty much everything in our stack.[22:24.000 --> 22:27.000] This is how we trigger our data pipelines when data comes in.[22:27.000 --> 22:32.000] This is how we trigger different, different lambdas that parse our[22:32.000 --> 22:35.000] certain information from your log, store them in different databases.[22:35.000 --> 22:40.000] This is how we also, how we, at some point in the back in the past,[22:40.000 --> 22:44.000] how we also triggered new deployments when new models were approved in[22:44.000 --> 22:46.000] your model registry.[22:46.000 --> 22:50.000] So basically everything we've been doing is, is fully event driven.[22:50.000 --> 22:51.000] Yeah.[22:51.000 --> 22:56.000] So, so I think this is a key thing you bring up here is that I've,[22:56.000 --> 23:00.000] I've talked to many people who don't use AWS, who are, you know,[23:00.000 --> 23:03.000] all alternatively experts at technology.[23:03.000 --> 23:06.000] And one of the things that I've heard some people say is like, oh,[23:06.000 --> 23:13.000] well, AWS is in as fast as X or Y, like Lambda is in as fast as X or Y or,[23:13.000 --> 23:17.000] you know, Kubernetes or, but, but the point you bring up is exactly the[23:17.000 --> 23:24.000] way I think about AWS is that the true advantage of AWS platform is the,[23:24.000 --> 23:29.000] is the tight integration with the services and you can design event[23:29.000 --> 23:31.000] driven workflows.[23:31.000 --> 23:33.000] Would you say that's, that's absolutely.[23:33.000 --> 23:34.000] Yeah.[23:34.000 --> 23:35.000] Yeah.[23:35.000 --> 23:39.000] I think designing event driven workflows on AWS is incredibly easy to do.[23:39.000 --> 23:40.000] Yeah.[23:40.000 --> 23:43.000] And it also comes incredibly natural and that's extremely powerful.[23:43.000 --> 23:44.000] Right.[23:44.000 --> 23:49.000] And simply by, by having an easy way how to trigger lambdas event driven,[23:49.000 --> 23:52.000] you can pretty much, right, pretty much do everything and glue[23:52.000 --> 23:54.000] everything together that you want.[23:54.000 --> 23:56.000] I think that gives you a tremendous flexibility.[23:56.000 --> 23:57.000] Yeah.[23:57.000 --> 24:00.000] So, so I think there's two things that come to mind now.[24:00.000 --> 24:07.000] One is that, that if you are developing an ML ops platform that you[24:07.000 --> 24:09.000] can't ignore Lambda.[24:09.000 --> 24:12.000] So I, because I've had some people tell me, oh, well, we can do this and[24:12.000 --> 24:13.000] this and this better.[24:13.000 --> 24:17.000] It's like, yeah, but if you're going to be on AWS, you have to understand[24:17.000 --> 24:18.000] why people use Lambda.[24:18.000 --> 24:19.000] It isn't speed.[24:19.000 --> 24:24.000] It's, it's the ease of, ease of developing very rich solutions.[24:24.000 --> 24:25.000] Right.[24:25.000 --> 24:26.000] Absolutely.[24:26.000 --> 24:28.000] And then the glue between, between what you are building eventually.[24:28.000 --> 24:33.000] And you can even almost your, the thoughts in your mind turn into Lambda.[24:33.000 --> 24:36.000] You know, like you can be thinking and building code so quickly.[24:36.000 --> 24:37.000] Absolutely.[24:37.000 --> 24:41.000] Everything turns into which event do I need to listen to and then I trigger[24:41.000 --> 24:43.000] a Lambda and that Lambda does this and that.[24:43.000 --> 24:44.000] Yeah.[24:44.000 --> 24:48.000] And the other part about Lambda that's pretty, pretty awesome is that it[24:48.000 --> 24:52.000] hooks into services that have infinite scale.[24:52.000 --> 24:56.000] Like so SQS, like you can't break SQS.[24:56.000 --> 24:59.000] Like there's nothing you can do to ever take SQS down.[24:59.000 --> 25:02.000] It handles unlimited requests in and unlimited requests out.[25:02.000 --> 25:04.000] How many systems are like that?[25:04.000 --> 25:05.000] Yeah.[25:05.000 --> 25:06.000] Yeah, absolutely.[25:06.000 --> 25:07.000] Yeah.[25:07.000 --> 25:12.000] So then this kind of a followup would be that, that maybe data scientists[25:12.000 --> 25:17.000] should learn Lambda and step functions in order to, to get to[25:17.000 --> 25:18.000] MLOps.[25:18.000 --> 25:21.000] I think that's a yes.[25:21.000 --> 25:25.000] If you want to, if you want to put the foot into MLOps and you are on AWS,[25:25.000 --> 25:31.000] then I think there is no way around learning these fundamentals.[25:31.000 --> 25:32.000] Right.[25:32.000 --> 25:35.000] There's no way around learning things like what is a Lambda?[25:35.000 --> 25:39.000] How do I, how do I create a Lambda via Terraform or whatever tool you're[25:39.000 --> 25:40.000] using there?[25:40.000 --> 25:42.000] And how do I hook it up to an event?[25:42.000 --> 25:47.000] And how do I, how do I use the AWS SDK to interact with different[25:47.000 --> 25:48.000] services?[25:48.000 --> 25:49.000] So, right.[25:49.000 --> 25:53.000] I think if you want to take a step into MLOps from, from coming more from[25:53.000 --> 25:57.000] the data science and it's extremely important to familiarize yourself[25:57.000 --> 26:01.000] with how do you, at least the fundamentals, how do you architect[26:01.000 --> 26:03.000] basic solutions on AWS?[26:03.000 --> 26:05.000] How do you glue services together?[26:05.000 --> 26:07.000] How do you make them speak to each other?[26:07.000 --> 26:09.000] So yeah, I think that's quite fundamental.[26:09.000 --> 26:14.000] Ideally, ideally, I think that's what the platform should take away from you[26:14.000 --> 26:16.000] as a, as a pure data scientist.[26:16.000 --> 26:19.000] You don't, should not necessarily have to deal with that stuff.[26:19.000 --> 26:23.000] But if you're interested in, if you want to make that move more towards MLOps,[26:23.000 --> 26:27.000] I think learning about infrastructure and specifically in the context of AWS[26:27.000 --> 26:31.000] about the services and how to use them is really fundamental.[26:31.000 --> 26:32.000] Yeah, it's good.[26:32.000 --> 26:33.000] Because this is automation eventually.[26:33.000 --> 26:37.000] And if you want to automate, if you want to automate your complex processes,[26:37.000 --> 26:39.000] then you need to learn that stuff.[26:39.000 --> 26:41.000] How else are you going to do it?[26:41.000 --> 26:42.000] Yeah, I agree.[26:42.000 --> 26:46.000] I mean, that's really what, what, what Lambda step functions are is their[26:46.000 --> 26:47.000] automation tools.[26:47.000 --> 26:49.000] So that's probably the better way to describe it.[26:49.000 --> 26:52.000] That's a very good point you bring up.[26:52.000 --> 26:57.000] Another technology that I think is an emerging technology is the[26:57.000 --> 26:58.000] managed file system.[26:58.000 --> 27:05.000] And the reason why I think it's interesting is that, so I 20 plus years[27:05.000 --> 27:11.000] ago, I was using file systems in the university setting when I was at[27:11.000 --> 27:14.000] Caltech and then also in film, film industry.[27:14.000 --> 27:22.000] So film has been using managed file servers with parallel processing[27:22.000 --> 27:24.000] farms for a long time.[27:24.000 --> 27:27.000] I don't know how many people know this, but in the film industry,[27:27.000 --> 27:32.000] the, the, the architecture, even from like 2000 was there's a very[27:32.000 --> 27:38.000] expensive file server and then there's let's say 40,000 machines or 40,000[27:38.000 --> 27:39.000] cores.[27:39.000 --> 27:40.000] And that's, that's it.[27:40.000 --> 27:41.000] That's the architecture.[27:41.000 --> 27:46.000] And now what's interesting is I see with data science and machine learning[27:46.000 --> 27:52.000] operations that like that, that could potentially happen in the future is[27:52.000 --> 27:57.000] actually a managed NFS mount point with maybe Kubernetes or something like[27:57.000 --> 27:58.000] that.[27:58.000 --> 28:01.000] Do you see any of that on the horizon?[28:01.000 --> 28:04.000] Oh, that's a good question.[28:04.000 --> 28:08.000] I think for our, for our, what we're currently doing, that's probably a[28:08.000 --> 28:10.000] bit further away.[28:10.000 --> 28:15.000] But in principle, I could very well imagine that in our use case, not,[28:15.000 --> 28:17.000] not quite.[28:17.000 --> 28:20.000] But in principle, definitely.[28:20.000 --> 28:26.000] And then maybe a third, a third emerging thing I'm seeing is what's going[28:26.000 --> 28:29.000] on with open AI and hugging face.[28:29.000 --> 28:34.000] And that has the potential, but maybe to change the game a little bit,[28:34.000 --> 28:38.000] especially with hugging face, I think, although both of them, I mean,[28:38.000 --> 28:43.000] there is that, you know, in the case of pre trained models, here's a[28:43.000 --> 28:48.000] perfect example is that an organization may have, you know, maybe they're[28:48.000 --> 28:53.000] using AWS even for this, they're transcribing videos and they're going[28:53.000 --> 28:56.000] to do something with them, maybe they're going to detect, I don't know,[28:56.000 --> 29:02.000] like, you know, if you recorded customers in your, I'm just brainstorm,[29:02.000 --> 29:05.000] I'm not seeing your company did this, but I'm just creating a hypothetical[29:05.000 --> 29:09.000] situation that they recorded, you know, customer talking and then they,[29:09.000 --> 29:12.000] they transcribe it to text and then run some kind of a, you know,[29:12.000 --> 29:15.000] criminal detection feature or something like that.[29:15.000 --> 29:19.000] Like they could build their own models or they could download the thing[29:19.000 --> 29:23.000] that was released two days ago or a day ago from open AI that transcribes[29:23.000 --> 29:29.000] things, you know, and then, and then turn that transcribe text into[29:29.000 --> 29:34.000] hugging face, some other model that summarizes it and then you could[29:34.000 --> 29:38.000] feed that into a system. So it's, what is, what is your, what are your[29:38.000 --> 29:42.000] thoughts around some of these pre trained models and is your, are you[29:42.000 --> 29:48.000] thinking of in terms of your stack, trying to look into doing fine tuning?[29:48.000 --> 29:53.000] Yeah, so I think pre trained models and especially the way that hugging face,[29:53.000 --> 29:57.000] I think really revolutionized the space in terms of really kind of[29:57.000 --> 30:02.000] platformizing the entire business around or the entire market around[30:02.000 --> 30:07.000] pre trained models. I think that is really quite incredible and I think[30:07.000 --> 30:10.000] really for the ecosystem a changing way how to do things.[30:10.000 --> 30:16.000] And I believe that looking at the, the costs of training large models[30:16.000 --> 30:19.000] and looking at the fact that many organizations are not able to do it[30:19.000 --> 30:23.000] for, because of massive costs or because of lack of data.[30:23.000 --> 30:29.000] I think this is a, this is a clear, makes it very clear how important[30:29.000 --> 30:33.000] such platforms are, how important sharing of pre trained models actually is.[30:33.000 --> 30:37.000] I believe it's a, we are only at the, quite at the beginning actually of that.[30:37.000 --> 30:42.000] And I think we're going to see that nowadays you see it mostly when it[30:42.000 --> 30:47.000] comes to fairly generalized data format, images, potentially videos, text,[30:47.000 --> 30:52.000] speech, these things. But I believe that we're going to see more marketplace[30:52.000 --> 30:57.000] approaches when it comes to pre trained models in a lot more industries[30:57.000 --> 31:01.000] and in a lot more, in a lot more use cases where data is to some degree[31:01.000 --> 31:05.000] standardized. Also when you think about, when you think about banking,[31:05.000 --> 31:10.000] for example, right? When you think about transactions to some extent,[31:10.000 --> 31:14.000] transaction, transaction data always looks the same, kind of at least at[31:14.000 --> 31:17.000] every bank. Of course you might need to do some mapping here and there,[31:17.000 --> 31:22.000] but also there is a lot of power in it. But because simply also thinking[31:22.000 --> 31:28.000] about sharing data is always a difficult thing, especially in Europe.[31:28.000 --> 31:32.000] Sharing data between organizations is incredibly difficult legally.[31:32.000 --> 31:36.000] It's difficult. Sharing models is a different thing, right?[31:36.000 --> 31:40.000] Basically, similar to the concept of federated learning. Sharing models[31:40.000 --> 31:44.000] is significantly easier legally than actually sharing data.[31:44.000 --> 31:48.000] And then applying these models, fine tuning them and so on.[31:48.000 --> 31:52.000] Yeah, I mean, I could just imagine. I really don't know much about[31:52.000 --> 31:56.000] banking transactions, but I would imagine there could be several[31:56.000 --> 32:01.000] kinds of transactions that are very normal. And then there's some[32:01.000 --> 32:06.000] transactions, like if you're making every single second,[32:06.000 --> 32:11.000] you're transferring a lot of money. And it happens just[32:11.000 --> 32:14.000] very quickly. It's like, wait, why are you doing this? Why are you transferring money[32:14.000 --> 32:20.000] constantly? What's going on? Or the huge sum of money only[32:20.000 --> 32:24.000] involves three different points in the network. Over and over again,[32:24.000 --> 32:29.000] just these three points are constantly... And so once you've developed[32:29.000 --> 32:33.000] a model that is anomaly detection, then[32:33.000 --> 32:37.000] yeah, why would you need to develop another one? I mean, somebody already did it.[32:37.000 --> 32:41.000] Exactly. Yes, absolutely, absolutely. And that's[32:41.000 --> 32:45.000] definitely... That's encoded knowledge, encoded information in terms of the model,[32:45.000 --> 32:49.000] which is not personally... Well, abstracts away from[32:49.000 --> 32:53.000] but personally identifiable data. And that's really the power. That is something[32:53.000 --> 32:57.000] that, yeah, as I've said before, you can share significantly easier and you can[32:57.000 --> 33:03.000] apply to your use cases. The kind of related to this in[33:03.000 --> 33:09.000] terms of upcoming technologies is, I think, dealing more with graphs.[33:09.000 --> 33:13.000] And so is that something from a stackwise that your[33:13.000 --> 33:19.000] company's investigated resource can do? Yeah, so when you think about[33:19.000 --> 33:23.000] transactions, bank transactions, right? And bank customers.[33:23.000 --> 33:27.000] So in our case, again, it's a... We only have pseudonymized[33:27.000 --> 33:31.000] transaction data, so actually we cannot see anything, right? We cannot see names, we cannot see[33:31.000 --> 33:35.000] iPads or whatever. We really can't see much. But[33:35.000 --> 33:39.000] you can look at transactions moving between[33:39.000 --> 33:43.000] different entities, between different accounts. You can look at that[33:43.000 --> 33:47.000] as a network, as a graph. And that's also what we very frequently do.[33:47.000 --> 33:51.000] You have your nodes in your network, these are your accounts[33:51.000 --> 33:55.000] or your presence, even. And the actual edges between them,[33:55.000 --> 33:59.000] that's what your transactions are. So you have this[33:59.000 --> 34:03.000] massive graph, actually, that also we as TMNL, as Transaction Montenegro,[34:03.000 --> 34:07.000] are sitting on. We're actually sitting on a massive transaction graph.[34:07.000 --> 34:11.000] So yeah, absolutely. For us, doing analysis on top of[34:11.000 --> 34:15.000] that graph, building models on top of that graph is a quite important[34:15.000 --> 34:19.000] thing. And like I taught a class[34:19.000 --> 34:23.000] a few years ago at Berkeley where we had to[34:23.000 --> 34:27.000] cover graph databases a little bit. And I[34:27.000 --> 34:31.000] really didn't know that much about graph databases, although I did use one actually[34:31.000 --> 34:35.000] at one company I was at. But one of the things I learned in teaching that[34:35.000 --> 34:39.000] class was about the descriptive statistics[34:39.000 --> 34:43.000] of a graph network. And it[34:43.000 --> 34:47.000] is actually pretty interesting, because I think most of the time everyone talks about[34:47.000 --> 34:51.000] median and max min and standard deviation and everything.[34:51.000 --> 34:55.000] But then with a graph, there's things like centrality[34:55.000 --> 34:59.000] and I forget all the terms off the top of my head, but you can see[34:59.000 --> 35:03.000] if there's a node in the network that's[35:03.000 --> 35:07.000] everybody's interacting with. Absolutely. You can identify communities[35:07.000 --> 35:11.000] of people moving around a lot of money all the time. For example,[35:11.000 --> 35:15.000] you can detect different metric features eventually[35:15.000 --> 35:19.000] doing computations on your graph and then plugging in some model.[35:19.000 --> 35:23.000] Often it's feature engineering. You're computing between the centrality scores[35:23.000 --> 35:27.000] across your graph or your different entities. And then[35:27.000 --> 35:31.000] you're building your features actually. And then you're plugging in some[35:31.000 --> 35:35.000] model in the end. If you do classic machine learning, so to say[35:35.000 --> 35:39.000] if you do graph deep learning, of course that's a bit different.[35:39.000 --> 35:43.000] So basically that could for people that are analyzing[35:43.000 --> 35:47.000] essentially networks of people or networks, then[35:47.000 --> 35:51.000] basically a graph database would be step one is[35:51.000 --> 35:55.000] generate the features which could be centrality.[35:55.000 --> 35:59.000] There's a score and then you then go and train[35:59.000 --> 36:03.000] the model based on that descriptive statistic.[36:03.000 --> 36:07.000] Exactly. So one way how you could think about it is[36:07.000 --> 36:11.000] whether we need a graph database or not, that always depends on your specific use case[36:11.000 --> 36:15.000] and what database. We're actually also running[36:15.000 --> 36:19.000] that using Spark. You have graph frames, you have[36:19.000 --> 36:23.000] graph X actually. So really stuff in Spark built for[36:23.000 --> 36:27.000] doing analysis on graphs.[36:27.000 --> 36:31.000] And then what you usually do is exactly what you said. You are trying[36:31.000 --> 36:35.000] to build features based on that graph.[36:35.000 --> 36:39.000] Based on the attributes of the nodes and the attributes on the edges and so on.[36:39.000 --> 36:43.000] And so I guess in terms of graph databases right[36:43.000 --> 36:47.000] now, it sounds like maybe the three[36:47.000 --> 36:51.000] main players maybe are there's Neo4j which[36:51.000 --> 36:55.000] has been around for a long time. There's I guess Spark[36:55.000 --> 36:59.000] and then there's also, I forgot what the one is called for AWS[36:59.000 --> 37:03.000] is it? Neptune, that's Neptune.[37:03.000 --> 37:07.000] Have you played with all three of those and did you[37:07.000 --> 37:11.000] like Neptune? Neptune was something we, Spark of course we actually currently[37:11.000 --> 37:15.000] using for exactly that. Also because it allows us to do[37:15.000 --> 37:19.000] to keep our stack fairly homogeneous. We did[37:19.000 --> 37:23.000] also PUC in Neptune a while ago already[37:23.000 --> 37:27.000] and well Neptune you definitely have essentially two ways[37:27.000 --> 37:31.000] how to query Neptune either using Gremlin or SparkQL.[37:31.000 --> 37:35.000] So that means the people, your data science[37:35.000 --> 37:39.000] need to get familiar with that which then is already one bit of a hurdle[37:39.000 --> 37:43.000] because usually data scientists are not familiar with either.[37:43.000 --> 37:47.000] But also what we found with Neptune[37:47.000 --> 37:51.000] is also that it's not necessarily built for[37:51.000 --> 37:55.000] as an analytics graph database. It's not necessarily made for[37:55.000 --> 37:59.000] that. And that then become, then it's sometimes, at least[37:59.000 --> 38:03.000] for us, it has become quite complicated to handle different performance considerations[38:03.000 --> 38:07.000] when you actually do fairly complex queries across that graph.[38:07.000 --> 38:11.000] Yeah, so you're bringing up like a point which[38:11.000 --> 38:15.000] happens a lot in my experience with[38:15.000 --> 38:19.000] technology is that sometimes[38:19.000 --> 38:23.000] the purity of the solution becomes the problem[38:23.000 --> 38:27.000] where even though Spark isn't necessarily[38:27.000 --> 38:31.000] designed to be a graph database system, the fact is[38:31.000 --> 38:35.000] people in your company are already using it. So[38:35.000 --> 38:39.000] if you just turn on that feature now you can use it and it's not like[38:39.000 --> 38:43.000] this huge technical undertaking and retraining effort.[38:43.000 --> 38:47.000] So even if it's not as good, if it works, then that's probably[38:47.000 --> 38:51.000] the solution your company will use versus I agree with you like a lot of times[38:51.000 --> 38:55.000] even if a solution like Neo4j is a pretty good example of[38:55.000 --> 38:59.000] it's an interesting product but[38:59.000 --> 39:03.000] you already have all these other products like do you really want to introduce yet[39:03.000 --> 39:07.000] another product into your stack. Yeah, because eventually[39:07.000 --> 39:11.000] it all comes with an overhead of course introducing it. That is one thing[39:11.000 --> 39:15.000] it requires someone to maintain it even if it's a[39:15.000 --> 39:19.000] managed service. Somebody needs to actually own it and look after it[39:19.000 --> 39:23.000] and then as you said you need to retrain people to also use it effectively.[39:23.000 --> 39:27.000] So it comes at significant cost and that is really[39:27.000 --> 39:31.000] something that I believe should be quite critically[39:31.000 --> 39:35.000] assessed. What is really the game you have? How far can you go with[39:35.000 --> 39:39.000] your current tooling and then eventually make[39:39.000 --> 39:43.000] that decision. At least personally I'm really[39:43.000 --> 39:47.000] not a fan of thinking tooling first[39:47.000 --> 39:51.000] but personally I really believe in looking at your organization, looking at the people[39:51.000 --> 39:55.000] what skills are there, looking at how effective[39:55.000 --> 39:59.000] are these people actually performing certain activities and processes[39:59.000 --> 40:03.000] and then carefully thinking about what really makes sense[40:03.000 --> 40:07.000] because it's one thing but people need to[40:07.000 --> 40:11.000] adopt and use the tooling and eventually it should really speed them up and improve[40:11.000 --> 40:15.000] how they develop. Yeah, I think it's very[40:15.000 --> 40:19.000] that's great advice that it's hard to understand how good of advice it is[40:19.000 --> 40:23.000] because it takes experience getting burned[40:23.000 --> 40:27.000] creating new technology. I've[40:27.000 --> 40:31.000] had experiences before where[40:31.000 --> 40:35.000] one of the mistakes I've made was putting too many different technologies in an organization[40:35.000 --> 40:39.000] and the problem is once you get enough complexity[40:39.000 --> 40:43.000] it can really explode and then[40:43.000 --> 40:47.000] this is the part that really gets scary is that[40:47.000 --> 40:51.000] let's take Spark for example. How hard is it to hire somebody that knows Spark? Pretty easy[40:51.000 --> 40:55.000] how hard is it going to be to hire somebody that knows[40:55.000 --> 40:59.000] Spark and then hire another person that knows the gremlin query[40:59.000 --> 41:03.000] language for Neptune, then hire another person that knows Kubernetes[41:03.000 --> 41:07.000] then tire another, after a while if you have so many different kinds of tools[41:07.000 --> 41:11.000] you have to hire so many different kinds of people that all[41:11.000 --> 41:15.000] productivity goes to a stop. So it's the hiring as well[41:15.000 --> 41:19.000] Absolutely, I mean it's virtually impossible[41:19.000 --> 41:23.000] to find someone who is really well versed with gremlin for example[41:23.000 --> 41:27.000] it's incredibly hard and I think tech hiring is hard[41:27.000 --> 41:31.000] by itself already[41:31.000 --> 41:35.000] so you really need to think about what can I hire for as well[41:35.000 --> 41:39.000] what expertise can I realistically build up?[41:39.000 --> 41:43.000] So that's why I think AWS[41:43.000 --> 41:47.000] even with some of the limitations about the ML platform[41:47.000 --> 41:51.000] the advantages of using AWS is that[41:51.000 --> 41:55.000] you have a huge audience of people to hire from and then the same thing like[41:55.000 --> 41:59.000] Spark, there's a lot of things I don't like about Spark but a lot of people[41:59.000 --> 42:03.000] use Spark and so if you use AWS and you use Spark[42:03.000 --> 42:07.000] let's say those two which you are then you're going to have a much easier time[42:07.000 --> 42:11.000] hiring people, you're going to have a much easier time training people[42:11.000 --> 42:15.000] there's tons of documentation about it so I think a lot of people[42:15.000 --> 42:19.000] are very wise that you're thinking that way but a lot of people don't think about that[42:19.000 --> 42:23.000] they're like oh I've got to use the latest, greatest stuff and this and this and this[42:23.000 --> 42:27.000] and then their company starts to get into trouble because they can't hire[42:27.000 --> 42:31.000] people, they can't maintain systems and then productivity starts to[42:31.000 --> 42:35.000] to degrees. Also something[42:35.000 --> 42:39.000] not to ignore is the cognitive load you put on a team[42:39.000 --> 42:43.000] that needs to manage a broad range of very different[42:43.000 --> 42:47.000] tools or services. It also puts incredible[42:47.000 --> 42:51.000] cognitive load on that team and you suddenly also need an incredible breadth[42:51.000 --> 42:55.000] of expertise in that team and that means you're also going[42:55.000 --> 42:59.000] to create single points of failures if you don't really[42:59.000 --> 43:03.000] scale up your team.[43:03.000 --> 43:07.000] It's something to really, I think when you go for[43:07.000 --> 43:11.000] new tooling you should really look at it from a holistic perspective[43:11.000 --> 43:15.000] not only about this is the latest and greatest.[43:15.000 --> 43:19.000] In terms of Europe versus[43:19.000 --> 43:23.000] US, have you spent much time in the US at all?[43:23.000 --> 43:27.000] Not at all actually, flying to the US Monday but no, not at all.[43:27.000 --> 43:31.000] That also would be kind of an interesting[43:31.000 --> 43:35.000] comparison in that the culture of the United States[43:35.000 --> 43:39.000] is really this culture of[43:39.000 --> 43:43.000] I would say more like survival of the fittest or you work[43:43.000 --> 43:47.000] seven days a week and you're constantly like you don't go on vacation[43:47.000 --> 43:51.000] and you're proud of it and I think it's not[43:51.000 --> 43:55.000] a good culture. I'm not saying that's a good thing, I think it's a bad[43:55.000 --> 43:59.000] thing and that a lot of times the critique people have[43:59.000 --> 44:03.000] about Europe is like oh will people take vacation all the time and all this[44:03.000 --> 44:07.000] and as someone who has spent time in both I would say[44:07.000 --> 44:11.000] yes that's a better approach. A better approach is that people[44:11.000 --> 44:15.000] should feel relaxed because when[44:15.000 --> 44:19.000] especially the kind of work you do in MLOPs[44:19.000 --> 44:23.000] is that you need people to feel comfortable and happy[44:23.000 --> 44:27.000] and more the question[44:27.000 --> 44:31.000] what I was going to is that[44:31.000 --> 44:35.000] I wonder if there is a more productive culture[44:35.000 --> 44:39.000] for MLOPs in Europe[44:39.000 --> 44:43.000] versus the US in terms of maintaining[44:43.000 --> 44:47.000] systems and building software where the US[44:47.000 --> 44:51.000] what it's really been good at I guess is kind of coming up with new[44:51.000 --> 44:55.000] ideas and there's lots of new services that get generated but[44:55.000 --> 44:59.000] the quality and longevity[44:59.000 --> 45:03.000] is not necessarily the same where I could see[45:03.000 --> 45:07.000] in the stuff we just talked about which is if you're trying to build a team[45:07.000 --> 45:11.000] where there's low turnover[45:11.000 --> 45:15.000] you have very high quality output[45:15.000 --> 45:19.000] it seems like that maybe organizations[45:19.000 --> 45:23.000] could learn from the European approach to building[45:23.000 --> 45:27.000] and maintaining systems for MLOPs.[45:27.000 --> 45:31.000] I think there's definitely some truth in it especially when you look at the median[45:31.000 --> 45:35.000] tenure of a tech person in an organization[45:35.000 --> 45:39.000] I think that is actually still significantly lower in the US[45:39.000 --> 45:43.000] I'm not sure I think in the Bay Area somewhere around one year or two months or something like that[45:43.000 --> 45:47.000] compared to Europe I believe[45:47.000 --> 45:51.000] still fairly low. Here of course in tech people also like to switch companies more often[45:51.000 --> 45:55.000] but I would say average is still more around[45:55.000 --> 45:59.000] two years something around that staying with the same company[45:59.000 --> 46:03.000] also in tech which I think is a bit longer[46:03.000 --> 46:07.000] than you would typically have it in the US.[46:07.000 --> 46:11.000] I think from my perspective where I've also built up most of the[46:11.000 --> 46:15.000] current team I think it's[46:15.000 --> 46:19.000] super important to hire good people[46:19.000 --> 46:23.000] and people that fit to the team fit to the company culture wise[46:23.000 --> 46:27.000] but also give them[46:27.000 --> 46:31.000] let them not be in a sprint all the time[46:31.000 --> 46:35.000] it's about having a sustainable way of working in my opinion[46:35.000 --> 46:39.000] and that sustainable way means you should definitely take your vacation[46:39.000 --> 46:43.000] and I think usually in Europe we have quite generous[46:43.000 --> 46:47.000] even by law vacation I mean in Netherlands by law you get 20 days a year[46:47.000 --> 46:51.000] but most companies give you 25 many IT companies[46:51.000 --> 46:55.000] 30 per year so that's quite nice[46:55.000 --> 46:59.000] but I do take that so culture wise it's really everyone[46:59.000 --> 47:03.000] likes to take vacations whether that's sea level or whether that's an engineer on a team[47:03.000 --> 47:07.000] and that's in many companies that's also really encouraged[47:07.000 --> 47:11.000] to have a healthy work life balance[47:11.000 --> 47:15.000] and of course it's not only about vacations also but growth opportunities[47:15.000 --> 47:19.000] letting people explore develop themselves[47:19.000 --> 47:23.000] and not always pushing on max performance[47:23.000 --> 47:27.000] so really at least I always see like a partnership[47:27.000 --> 47:31.000] the organization wants to get something from an[47:31.000 --> 47:35.000] employee but the employee should also be encouraged and developed[47:35.000 --> 47:39.000] in that organization a
An airhacks.fm conversation with Mark Sailes (@MarkSailes3) about: AWS Lambda Powertools for Java,, fetching and caching secrets, default caching retention period for short lived secrets, the limits of the clouds, the consistency of DynamoDB, DynamoDB connectivity with AWS Lambda, RDS Proxy, "Proprietary Cloud Native Managed Service", SQS, SNS, Kinesis, the fan-out pattern is implemented with SQS and SNS; EventBridge could replace SQS and SNS for the implementation of fan out patterns, AWS Lambda Powertools for Java Idempontency, using request as idempotency key, transactions, network errors and idempotency, idempotency per design, partial batch failures handling, bisect in SQS, SQS Large Message Handling, message offloading to S3, S3 transactional writes, dependency injection of Lambda resources, secret injection with AWS Lambda, JSON-logging and AWS Lambda Powertools for Java Logging, converting json to metrics in CloudWatch, Mark Sailes on twitter: @MarkSailes3, Mark's blog: mark-sailes.medium.com
Dal mondiale di Formula 1 ad up2go, una chiacchierata sprint con Paolo per parlare del cloud come piattaforma IoT a... Quattro ruote.
About PeterPeter's spent more than a decade building scalable and robust systems at startups across adtech and edtech. At Remind, where he's VP of Technology, Peter pushes for building a sustainable tech company with mature software engineering. He lives in Southern California and enjoys spending time at the beach with his family.Links: Redis: https://redis.com/ Remind: https://www.remind.com/ Remind Engineering Blog: https://engineering.remind.com LinkedIn: https://www.linkedin.com/in/hamiltop Email: peterh@remind101.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn and this is a fun episode. It is a promoted episode, which means that our friends at Redis have gone ahead and sponsored this entire episode. I asked them, “Great, who are you going to send me from, generally, your executive suite?” And they said, “Nah. You already know what we're going to say. We want you to talk to one of our customers.” And so here we are. My guest today is Peter Hamilton, VP of Technology at Remind. Peter, thank you for joining me.Peter: Thanks, Corey. Excited to be here.Corey: It's always interesting when I get to talk to people on promoted guest episodes when they're a customer of the sponsor because to be clear, you do not work for Redis. This is one of those stories you enjoy telling, but you don't personally have a stake in whether people love Redis, hate Redis, adopt that or not, which is exactly what I try and do on these shows. There's an authenticity to people who have in-the-trenches experience who aren't themselves trying to sell the thing because that is their entire job in this world.Peter: Yeah. You just presented three or four different opinions and I guarantee we felt all at the different times.Corey: [laugh]. So, let's start at the very beginning. What does Remind do?Peter: So, Remind is a messaging tool for education, largely K through 12. We support about 30 million active users across the country, over 2 million teachers, making sure that every student has, you know, equal opportunities to succeed and that we can facilitate as much learning as possible.Corey: When you say messaging that could mean a bunch of different things to a bunch of different people. Once on a lark, I wound up sitting down—this was years ago, so I'm sure the number is a woeful underestimate now—of how many AWS services I could use to send a message from me to you. And this is without going into the lunacy territory of, “Well, I can tag a thing and then mail it to you like a Snowball Edge or something.” No, this is using them as intended, I think I got 15 or 16 of them. When you say messaging, what does that mean to you?Peter: So, for us, it's about communication to the end-user. We will do everything we can to deliver whatever message a teacher or district administrator has to the user. We go through SMS, text messaging, we go through Apple and Google's push services, we go through email, we go through voice call, really pulling out all the stops we can to make sure that these important messages get out.Corey: And I can only imagine some of the regulatory pressure you almost certainly experience. It feels like it's not quite to HIPAA levels, where ohh, there's a private cause of action if any of this stuff gets out, but people are inherently sensitive about communications involving their children. I always sort of knew this in a general sense, and then I had kids myself, and oh, yeah, suddenly I really care about those sorts of things.Peter: Yeah. One of the big challenges, you can build great systems that do the correct thing, but at the end of the day, we're relying on a teacher choosing the right recipient when they send a message. And so we've had to build a lot of processes and controls in place, so that we can, kind of, satisfy two conflicting needs: One is to provide a clear audit log because that's an important thing for districts to know if something does happen, that we have clear communication; and the other is to also be able to jump in and intervene when something inappropriate or mistaken is sent out to the wrong people.Corey: Remind has always been one of those companies that has a somewhat exalted reputation in the AWS space. You folks have been early adopters of a bunch of different services—which let's be clear, in the responsible way, not the, “Well, they said it on stage; time to go ahead and put everything they just listed into production because we for some Godforsaken reason, view it as a todo list.”—but you've been thoughtful about how you approach things, and you have been around as a company for a while. But you've also been making a significant push toward being cloud-native by certain definitions of that term. So, I know this sounds like a college entrance essay, but what does cloud-native mean to you?Peter: So, one of the big gaps—if you take an application that was written to be deployed in a traditional data center environment and just drop it in the cloud, what you're going to get is a flaky data center.Corey: Well, that's unfair. It's also going to be extremely expensive.Peter: [laugh]. Sorry, an expensive, flaky data set.Corey: There we go. There we go.Peter: What we've really looked at–and a lot of this goes back to our history in the earlier days; we ran a top of Heroku and it was kind of the early days what they call the Twelve-Factor Application—but making aggressive decisions about how you structure your architecture and application so that you fit in with some of the cloud tools that are available and that you fit in, you know, with the operating models that are out there.Corey: When you say an aggressive decision, what sort of thing are you talking about? Because when I think of being aggressive with an approach to things like AWS, it usually involves Twitter, and I'm guessing that is not the direction you intend that to go.Peter: No, I think if you look at Twitter or Netflix or some of these players that, quite frankly, have defined what AWS is to us today through their usage patterns, not quite that.Corey: Oh, I mean using Twitter to yell at them explicitly about things—Peter: Oh.Corey: —because I don't do passive-aggressive; I just do aggressive.Peter: Got it. No, I think in our case, it's been plotting a very narrow path that allows us to avoid some of the bigger pitfalls. We have our sponsor here, Redis. Talk a little bit about our usage of Redis and how that's helped us in some of these cases. One of the pitfalls you'll find with pulling a non-cloud-native application and put it in the cloud is state is hard to manage.If you put state on all your machines and machines go down, networks fail, all those things, you now no longer have access to that state and we start to see a lot of problems. One of the decisions we've made is try to put as much data as we can into data stores like Redis or Postgres or something, in order to decouple our hardware from the state we're trying to manage and provide for users so that we're more resilient to those sorts of failures.Corey: I get the sense from the way that we're having this conversation, when you talk about Redis, you mean actual Redis itself, not ElastiCache for Redis, or as to I'm tending to increasingly think about AWS's services, Amazon Basics for Redis.Peter: Yeah. I mean, Amazon has launched a number of products. They have their ElastiCache, they have their new MemoryDB, there's a lot different ways to use this. We've relied pretty heavily on Redis, previously known as Redis Labs, and their enterprise product in their cloud, in order to take care of our most important data—which we just don't want to manage ourselves—trying to manage that on our own using something like ElastiCache, there's so many pitfalls, so many ways that we can lose that data. This data is important to us. By having it in a trusted place and managed by a great ops team, like they have at Redis, we're able to then lean in on the other aspects of cloud data to really get as much value as we can out of AWS.Corey: I am curious. As I said you've had a reputation as a company for a while in the AWS space of doing an awful lot of really interesting things. I mean, you have a robust GitHub presence, you have a whole bunch of tools that have come out Remind that are great, I've linked to a number of them over the years in the newsletter. You are clearly not afraid, culturally, to get your hands dirty and build things yourself, but you are using Redis Enterprise as opposed to open-source Redis. What drove that decision? I have to assume it's not, “Wait. You mean, I can get it for free as an open-source project? Why didn't someone tell me?” What brought you to that decision?Peter: Yeah, a big part of this is what we could call operating leverage. Building a great set of tools that allow you to get more value out of AWS is a little different story than babysitting servers all day and making sure they stay up. So, if you look through, most of our contributions in open-source space have really been around here's how to expand upon these foundational pieces from AWS; here's how to more efficiently launch a suite of servers into an auto-scaling group; here's, you know, our troposphere and other pieces there. This was all before Amazon CDK product, but really, it was, here's how we can more effectively use CloudFormation to capture our Infrastructure as Code. And so we are not afraid in any way to invest in our tooling and invest in some of those things, but when we look at the trade-off of directly managing stateful services and dealing with all the uncertainty that comes, we feel our time is better spent working on our product and delivering value to our users and relying on partners like Redis in order to provide that stability we need.Corey: You raise a good point. An awful lot of the tools that you've put out there are the best, from my perspective, approach to working with AWS services. And that is a relatively thin layer built on top of them with an eye toward making the user experience more polished, but not being so heavily opinionated that as soon as the service goes in a different direction, the tool becomes completely useless. You just decide to make it a bit easier to wind up working with specific environment variables or profiles, rather than what appears to be the AWS UX approach of, “Oh, now type in your access key, your secret key and your session token, and we've disabled copy and paste. Go, have fun.” You've really done a lot of quality of life improvements, more so than you have this is the entire system of how we do deploys, start to finish. It's opinionated and sort of a, like, a take on what Netflix, did once upon a time, with Asgard. It really feels like it's just the right level of abstraction.Peter: We did a pretty good job. I will say, you know, years later, we felt that we got it wrong a couple times. It's been really interesting to see that, that there are times when we say, “Oh, we could take these three or four services and wrap it up into this new concept of an application.” And over time, we just have to start poking holes in that new layer and we start to see we would have been better served by sticking with as thin a layer as possible that enables us, rather than trying to get these higher-level pieces.Corey: It's remarkably refreshing to hear you say that just because so many people love to tell the story on podcasts, or on conference stages, or whatever format they have of, “This is what we built.” And it is an aspirationally superficial story about this. They don't talk about that, “Well, firstly, without these three wrong paths first.” It's always a, “Oh, yes, obviously, we are smart people and we only make the correct decision.”And I remember in the before times sitting in conference talks, watching people talk about great things they'd done, and I'll turn next to the person next to me and say, “Wow, I wish I could be involved in a project like that.” And they'll say, “Yeah, so do I.” And it turns out they work at the company the speaker is from. Because all of these things tend to be the most positive story. Do you have an example of something that you have done in your production environment that going back, “Yeah, in hindsight, I would have done that completely differently.”Peter: Yeah. So, coming from Heroku moving into AWS, we had a great open-source project called Empire, which kind of bridge that gap between them, but used Amazon's ECS in order to launch applications. It was actually command-line compatible with the Heroku command when it first launched. So, a very big commitment there. And at the time—I mean, this comes back to the point I think you and I were talking about earlier, where architecture, costs, infrastructure, they're all interlinked.And I'm a big fan of Conway's Law, which says that an organization's structure needs to match its architecture. And so six, seven years ago, we're heavy growth-based company and we are interns running around, doing all the things, and we wanted to have really strict guardrails and a narrow set of things that our development team could do. And so we built a pretty constrained: You will launch, you will have one Docker image per ECS service, it can only do these specific things. And this allowed our development team to focus on pretty buttons on the screen and user engagement and experiments and whatnot, but as we've evolved as a company, as we built out a more robust business, we've started to track revenue and costs of goods sold more aggressively, we've seen, there's a lot of inefficient things that come out of it.One particular example was we used PgBouncer for our connection pooling to our Postgres application. In the traditional model, we had an auto-scaling group for a PgBouncer, and then our auto-scaling groups for the other applications would connect to it. And we saw additional latency, we saw additional cost, and we eventually kind of twirl that down and packaged that PgBouncer alongside the applications that needed it. And this was a configuration that wasn't available on our first pass; it was something we intentionally did not provide to our development team, and we had to unwind that. And when we did, we saw better performance, we saw better cost efficiency, all sorts of benefits that we care a lot about now that we didn't care about as much, many years ago.Corey: It sounds like you're describing some semblance of an internal platform, where instead of letting all your engineers effectively, “Well, here's the console. Ideally, you use some form of Infrastructure as Code. Good luck. Have fun.” You effectively gate access to that. Is that something that you're still doing or have you taken a different approach?Peter: So, our primary gate is our Infrastructure as Code repository. If you want to make a meaningful change, you open up a PR, got to go through code review, you need people to sign off on it. Anything that's not there may not exist tomorrow. There's no guarantees. And we've gone around, occasionally just shut random servers down that people spun up in our account.And sometimes people will be grumpy about it, but you really need to enforce that culture that we have to go through the correct channels and we have to have this cohesive platform, as you said, to support our development efforts.Corey: So, you're a messaging service in education. So, whenever I do a little bit of digging into backstories of companies and what has made, I guess, an impression, you look for certain things and explicit dates are one of them, where on March 13th of 2020, your business changed just a smidgen. What happened other than the obvious, we never went outside for two years?Peter: [laugh]. So, if we roll back a week—you know, that's March 13th, so if we roll back a week, we're looking at March 6th. On that day, we sent out about 60 million messages over all of our different mediums: Text, email, push notifications. On March 13th that was 100 million, and then, a few weeks later on March 30th, that was 177 million. And so our traffic effectively tripled over the course of those three weeks. And yeah, that's quite a ride, let me tell you.Corey: The opinion that a lot of folks have who have not gotten to play in sophisticated distributed systems is, “Well, what's the hard part there you have an auto-scaling group. Just spin up three times the number of servers in that fleet and problem solved. What's challenging?” A lot, but what did you find that the pressure points were?Peter: So, I love that example, that your auto-scaling group will just work. By default, Amazon's auto-scaling groups only support 1000 backends. So, when your auto-scaling group goes from 400 backends to 1200, things break, [laugh] and not in ways that you would have expected. You start to learn things about how database systems provided by Amazon have limits other than CPU and memory. And they're clearly laid out that there's network bandwidth limits and things you have to worry about.We had a pretty small team at that time and we'd gotten this cadence where every Monday morning, we would wake up at 4 a.m. Pacific because as part of the pandemic, our traffic shifted, so our East Coast users would be most active in the morning rather than the afternoon. And so at about 7 a.m. on the east coast is when everyone came online. And we had our Monday morning crew there and just looking to see where the next pain point was going to be.And we'd have Monday, walk through it all, Monday afternoon, we'd meet together, we come up with our three or four hypotheses on what will break, if our traffic doubles again, and we'd spend the rest of that next week addressing those the best we could and repeat for the next Monday. And we did this for three, four or five weeks in a row, and finally, it stabilized. But yeah, it's all the small little things, the things you don't know about, the limits in places you don't recognize that just catch up to you. And you need to have a team that can move fast and adapt quickly.Corey: You've been using Redis for six, seven years, something along those lines, as an enterprise offering. You've been working with the same vendor who provides this managed service for a while now. What are the fruits of that relationship? What is the value that you see by continuing to have a long-term relationship with vendors? Because let's be serious, most of us don't stay in jobs that long, let alone work with the same vendor.Peter: Yeah. So, coming back to the March 2020 story, many of our vendors started to see some issues here that various services weren't scaled properly. We made a lot of phone calls to a lot of vendors in working with them, and I… very impressed with how Redis Labs at the time was able to respond. We hopped on a call, they said, “Here's what we think we need to do, we'll go ahead and do this. We'll sort this out in a few weeks and figure out what this means for your contract. We're here to help and support in this pandemic because we recognize how this is affecting everyone around the world.”And so I think when you get in those deeper relationships, those long-term relationships, it is so helpful to have that trust, to have a little bit of that give when you need it in times of crisis, and that they're there and willing to jump in right away.Corey: There's a lot to be said for having those working relationships before you need them. So often, I think that a lot of engineering teams just don't talk to their vendors to a point where they may as well be strangers. But you'll see this most notably because—at least I feel it most acutely—with AWS service teams. They'll do a whole kickoff when the enterprise support deal is signed, three years go passed, and both the AWS team and the customer's team have completely rotated since then, and they may as well be strangers. Being able to have that relationship to fall back on in those really weird really, honestly, high-stress moments has been one of those things where I didn't see the value myself until the first time I went through a hairy situation where I found that that was useful.And now it's oh, I—I now bias instead for, “Oh, I can fit to the free tier of this service. No, no, I'm going to pay and become a paying customer.” I'd rather be a customer that can have that relationship and pick up the phone than someone whining at people in a forum somewhere of, “Hey, I'm a free user, and I'm having some problems with production.” Just never felt right to me.Peter: Yeah, there's nothing worse than calling your account rep and being told, “Oh, I'm not your account rep anymore.” Somehow you missed the email, you missed who it was. Prior to Covid, you know—and we saw this many, many years ago—one of the things about Remind is every back-to-school season, our traffic 10Xes in about three weeks. And so we're used to emergencies happening and unforeseen things happening. And we plan through our year and try to do capacity planning and everything, but we been around the block a couple of times.And so we have a pretty strong culture now leaning in hard with our support reps. We have them in our Slack channels. Our AWS team, we meet with often. Redis Labs, we have them on Slack as well. We're constantly talking about databases that may or may not be performing as we expect them, too. They're an extension of our team, we have an incident; we get paged. If it's related to one of the services, we hit them in Slack immediately and have them start checking on the back end while we're checking on our side. So.Corey: One of the biggest takeaways I wish more companies would have is that when you are dependent upon another company to effectively run your production infrastructure, they are no longer your vendor, they're your partner, whether you want them to be or not. And approaching it with that perspective really pays dividends down the road.Peter: Yeah. One of the cases you get when you've been at a company for a long time and been in relationship for a long time is growing together is always an interesting approach. And seeing, sometimes there's some painful points; sometimes you're on an old legacy version of their product that you were literally the last customer on, and you got to work with them to move off of. But you were there six years ago when they're just starting out, and they've seen how you grow, and you've seen how they've grown, and you've kind of been able to marry that experience together in a meaningful way.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: Redis is, these days, of data platform back once upon a time, I viewed it as more of a caching layer. And I admit that the capabilities of the platform has significantly advanced since those days when I viewed it purely through lens of cache. But one of the interesting parts is that neither one of those use cases, in my mind, blends particularly well with heavy use of Spot Fleets, but you're doing exactly that. What are your folks doing over there?Peter: [laugh]. Yeah, so as I mentioned earlier, coming back to some of the Twelve-Factor App design, we heavily rely on Redis as sort of a distributed heap. One of our challenges of delivering all these messages is every single message has its in-flight state: Here's the content, here's who we sent it to, we wait for them to respond. On a traditional application, you might have one big server that stores it all in-memory, and you get the incoming requests, and you match things up. By moving all that state to Redis, all of our workers, all of our application servers, we know they can disappear at any point in time.We use Amazon's Spot Instances and their Spot Fleet for all of our production traffic. Every single web service, every single worker that we have runs on this infrastructure, and we would not be able to do that if we didn't have a reliable and robust place to store this data that is in-flight and currently being accessed. So, we'll have a couple hundred gigs of data at any point in time in a Redis Database, just representing in-flight work that's happening on various machines.Corey: It's really neat seeing Spot Fleets being used as something more than a theoretical possibility. It's something I've always been very interested in, obviously, given the potential cost savings; they approach cheap is free in some cases. But it turns out—we talked earlier about the idea of being cloud-native versus the rickety, expensive data center in the cloud, and an awful lot of applications are simply not built in a way that yeah, we're just going to randomly turn off a subset of your systems, ideally, with two minutes of notice, but all right, have fun with that. And a lot of times, it just becomes a complete non-starter, even for stateless workloads, just based upon how all of these things are configured. It is really interesting to watch a company that has an awful lot of responsibility that you've been entrusted with who embraces that mindset. It's a lot more rare than you'd think.Peter: Yeah. And again, you know, sometimes, we overbuild things, and sometimes we go down paths that may have been a little excessive, but it really comes down to your architecture. You know, it's not just having everything running on Spot. It's making effective use of SQS and other queueing products at Amazon to provide checkpointing abilities, and so you know that should you lose an instance, you're only going to lose a few seconds of productive work on that particular workload and be able to kick off where you left off.It's properly using auto-scaling groups. From the financial side, there's all sorts of weird quirks you'll see. You know, the Spot market has a wonderful set of dynamics where the big instances are much, much cheaper per CPU than the small ones are on the Spot market. And so structuring things in a way that you can colocate different workloads onto the same hosts and hedge against the host going down by spreading across multiple availability zones. I think there's definitely a point where having enough workload, having enough scale allows you to take advantage of these things, but it all comes down to the architecture and design that really enables it.Corey: So, you've been using Redis for longer than I think many of our listeners have been in tech.Peter: [laugh].Corey: And the key distinguishing points for me between someone who is an advocate for a technology and someone who's a zealot—or a pure critic—is they can identify use cases for which is great and use cases for which it is not likely to be a great experience. In your time with Redis, what have you found that it's been great at and what are some areas that you would encourage people to consider more carefully before diving into it?Peter: So, we like to joke that five, six years ago, most of our development process was, “I've hit a problem. Can I use Redis to solve that problem?” And so we've tried every solution possible with Redis. We've done all the things. We have number of very complicated Lua scripts that are managing different keys in an atomic way.Some of these have been more successful than others, for sure. Right now, our biggest philosophy is, if it is data we need quickly, and it is data that is important to us, we put it in Enterprise Redis, the cloud product from Redis. Other use cases, there's a dozen things that you can use for a cache, Redis is great for cache, memcache does a decent job as well; you're not going to see a meaningful difference between those sorts of products. Where we've struggled a little bit has been when we have essentially relational data that we need fast access to. And we're still trying to find a clear path forward here because you can do it and you can have atomic updates and you can kind of simulate some of the ACID characteristics you would have in a relational database, but it adds a lot of complexity.And that's a lot of overhead to our team as we're continuing to develop these products, to extend them, to fix any bugs you might have in there. And so we're kind of recalibrating a bit, and some of those workloads are moving to other data stores where they're more appropriate. But at the end of the day, it's data that we need fast, and it's data that's important, we're sticking with what we got here because it's been working pretty well.Corey: It sounds almost like you started off with the mindset of one database for a bunch of different use cases and you're starting to differentiate into purpose-built databases for certain things. Or is that not entirely accurate?Peter: There's a little bit of that. And I think coming back to some of our tooling, as we kind of jumped on a bit of the microservice bandwagon, we would see, here's a small service that only has a small amount of data that needs to be stored. It wouldn't make sense to bring up a RDS instance, or an Aurora instance, for that, you know, in Postgres. Let's just store it in an easystore like Redis. And some of those cases have been great, some of them have been a little problematic.And so as we've invested in our tooling to make all our databases accessible and make it less of a weird trade-off between what the product needs, what we can do right now, and what we want to do long-term, and reduce that friction, we've been able to be much more deliberate about the data source that we choose in each case.Corey: It's very clear that you're speaking with a voice of experience on this where this is not something that you just woke up and figured out. One last area I want to go into with you is when I asked you what is you care about primarily as an engineering leader and as you look at serving your customers well, you effectively had a dual answer, almost off the cuff, of stability and security. I find the two of those things are deeply intertwined in most of the conversations I have, but they're rarely called out explicitly in quite the way that you do. Talk to me about that.Peter: Yeah, so in our wild journey, stability has always been a challenge. And we've alway—you know, been an early startup mode, where you're constantly pushing what can we ship? How quickly can we ship it? And in our particular space, we feel that this communication that we foster between teachers and students and their parents is incredibly important, and is a thing that we take very, very seriously. And so, a couple years ago, we were trying to create this balance and create not just a language that we could talk about on a podcasts like this, but really recognizing that framing these concepts to our company internally: To our engineers to help them to think as they're building a feature, what are the things they should think about, what are the concerns beyond the product spec; to work with our marketing and sales team to help them to understand why we're making these investments that may not get particular feature out by X date but it's still a worthwhile investment.So, from the security side, we've really focused on building out robust practices and robust controls that don't necessarily lock us into a particular standard, like PCI compliance or things like that, but really focusing on the maturity of our company and, you know, our culture as we go forward. And so we're in a place now we are ISO 27001; we're heading into our third year. We leaned in hard our disaster recovery processes, we've leaned in hard on our bug bounties, pen tests, kind of, found this incremental approach that, you know, day one, I remember we turned on our bug bounty and it was a scary day as the reports kept coming in. But we take on one thing at a time and continue to build on it and make it an essential part of how we build systems.Corey: It really has to be built in. It feels like security is not something could be slapped on as an afterthought, however much companies try to do that. Especially, again, as we started this episode with, you're dealing with communication with people's kids. That is something that people have remarkably little sense of humor around. And rightfully so.Seeing that there is as much if not more care taken around security than there is stability is generally the sign of a well-run organization. If there's a security lapse, I expect certain vendors to rip the power out of their data centers rather than run in an insecure fashion. And your job done correctly—which clearly you have gotten to—means that you never have to make that decision because you've approached this the right way from the beginning. Nothing's perfect, but there's always the idea of actually caring about it being the first step.Peter: Yeah. And the other side of that was talking about stability, and again, it's avoiding the either/or situation. We can work in as well along those two—stability and security—we work in our cost of goods sold and our operating leverage in other aspects of our business. And every single one of them, it's our co-number one priorities are stability and security. And if it costs us a bit more money, if it takes our dev team a little longer, there's not a choice at that point. We're doing the correct thing.Corey: Saving money is almost never the primary objective of any company that you really want to be dealing with unless something bizarre is going on.Peter: Yeah. Our philosophy on, you know, any cost reduction has been this should have zero negative impact on our stability. If we do not feel we can safely do this, we won't. And coming back to the Spot Instance piece, that was a journey for us. And you know, we tested the waters a bit and we got to a point, we worked very closely with Amazon's team, and we came to that conclusion that we can safely do this. And we've been doing it for over a year and seen no adverse effects.Corey: Yeah. And a lot of shops I've talked to folks about well, when we go and do a consulting project, it's, “Okay. There's a lot of things that could have been done before we got here. Why hasn't any of that been addressed?” And the answer is, “Well. We tried to save money once and it caused an outage and then we weren't allowed to save money anymore. And here we are.” And I absolutely get that perspective. It's a hard balance to strike. It always is.Peter: Yeah. The other aspect where stability and security kind of intertwine is you can think about security as InfoSec in our systems and locking things down, but at the end of the day, why are we doing all that? It's for the benefit of our users. And Remind, as a communication platform, and safety and security of our users is as dependent on us being up and available so that teachers can reach out to parents with important communication. And things like attendance, things like natural disasters, or lockdowns, or any of the number of difficult situations schools find themselves in. This is part of why we take that stewardship that we have so seriously is that being up and protecting a user's data just has such a huge impact on education in this country.Corey: It's always interesting to talk to folks who insists they're making the world a better place. And it's, “What do you do?” “We're improving ad relevance.” I mean, “Okay, great, good for you.” You're serving a need that I would I would not shy away from classifying what you do, fundamentally, as critical infrastructure, and that is always a good conversation to have. It's nice being able to talk to folks who are doing things that you can unequivocally look at and say, “This is a good thing.”Peter: Yeah. And around 80% of public schools in the US are using Remind in some capacity. And so we're not a product that's used in a few civic regions. All across the board. One of my favorite things about working in Remind is meeting people and telling them where I work, and they recognize it.They say, “Oh, I have that app, I use that app. I love it.” And I spent years and ads before this, and you know, I've been there and no one ever told me they were glad to see an ad. That's never the case. And it's been quite a rewarding experience coming in every day, and as you said, being part of this critical infrastructure. That's a special thing.Corey: I look forward to installing the app myself as my eldest prepares to enter public school in the fall. So, now at least I'll have a hotline of exactly where to complain when I didn't get the attendance message because, you know, there's no customer quite like a whiny customer.Peter: They're still customers. [laugh]. Happy to have them.Corey: True. We tend to be. I want to thank you for taking so much time out of your day to speak with me. If people want to learn more about what you're up to, where's the best place to find you?Peter: So, from an engineering perspective at Remind, we have our blog, engineering.remind.com. If you want to reach out to me directly. I'm on LinkedIn; good place to find me or you can just reach out over email directly, peterh@remind101.com.Corey: And we will put all of that into the show notes. Thank you so much for your time. I appreciate it.Peter: Thanks, Corey.Corey: Peter Hamilton, VP of Technology at Remind. This has been a promoted episode brought to us by our friends at Redis, and I'm Cloud Economist Corey Quinn. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry and insulting comment that you will then hope that Remind sends out to 20 million students all at once.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
В этом эпизоде мы поговорили про Serverless сервисы в AWS. Мы рассказали об основных сервисах таких как AWS Lambda, API Gateway, EventBridge, SNS, SQS, StepFunctions и как они помогут вам при создании новых микросервисных, event-driven приложений. Рассмотрели типичные сценарии использования этих сервисов и рассказали о лучших практиках их использования, как построить CI/CD для ваших лямбд. Также упомянули некоторые случаи, когда Serverless будет не лучшим вариантом. Таймкоды: 00:00:24 - Рома Бойко в гостях 00:02:28 - Почему serverless 00:07:20 - 4 характеристики serverless 00:09:05 - Cписок serverless сервисов в AWS 00:12:00 - Основные use-cases 00:16:23 - Как построить CI/CD для serverless 00:22:20 - Ограничения Lambda 00:24:30 - AWS Lambda power tuning 00:28:43 - Разогрев Lambda функций Ссылки Best practices of advanced serverless developers AWS Lambda power tuning Если у вас есть вопросы, предложения темы, пишите мне в Linkedin - https://www.linkedin.com/in/vedmich/
Luciano and Eoin explore the wonderful world of data streaming using Kafka on AWS. In this episode we focus mainly on Managed Streaming for Kafka (or MSK) and discuss what are the main differences between MSK and Kinesis. We also explore the main features that MSK provides, its scaling characteristics, pricing and, finally, how MSK works in conjunction with other AWS services. We conclude the episode by providing a decision tree that should help you to decide whether you should use Kinesis or MSK or avoid streaming services entirely in favor of something like SNS or SQS. In this episode, we mentioned the following resources: - Our previous episode on Kinesis data streams: https://www.youtube.com/watch?v=u_nR6up4Kvs - Our series of Event services: https://www.youtube.com/watch?v=CG7uhkKftoY&list=PLAWXFhe0N1vLHkGO1ZIWW_SZpturHBiE_ - AWS MSK sizing spreadsheet: https://dy7oqpxkwhskb.cloudfront.net/MSK_Sizing_Pricing.xlsx - Should My Startup use Kinesis or MSK? - https://www.youtube.com/watch?v=TJS19EuzH2k - Intro to MSK (reinvent talk from 2018) - https://www.youtube.com/watch?v=9nKswHsLseY - Running Apache Kafka on AWS (by Frank Munz) - https://www.youtube.com/watch?v=HtU9pb18g5Q - Cloudonaut - Kinesis versus MSK - https://www.youtube.com/watch?v=kcBAKz0MPf8 This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige
Luciano and Eoin deep dive into SNS discussing what it does, how it differs from EventBridge and SQS and how you can use it to send messages to customers but also for microservices communication. In this new episode dedicated to AWS events and messaging services, we learn everything there is to know about SNS including advantages, limitations and cost. This episode complements the episode about EventBridge, giving another perspective on when to use SNS and when to pick EventBridge instead. In this episode, we mentioned the following resources: - Our previous episode about EventBridge: https://www.youtube.com/watch?v=UjIE5qp-v8w - Our previous episode about all things SQS: https://www.youtube.com/watch?v=svoA-ds8-8c - Our introductory episode about what services you should use for events: https://www.youtube.com/watch?v=CG7uhkKftoY - A comparison between EventBridge and SNS by Cloudonaut: https://cloudonaut.io/eventbridge-vs-sns/ This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige
About RandallRandall Hunt, VP of Cloud Strategy and Solutions at Caylent, is a technology leader, investor, and hands-on-keyboard coder based in Los Angeles, CA. Previously, Randall led software and developer relations teams at Facebook, SpaceX, AWS, MongoDB, and NASA. Randall spends most of his time listening to customers, building demos, writing blog posts, and mentoring junior engineers. Python and C++ are his favorite programming languages, but he begrudgingly admits that Javascript rules the world. Outside of work, Randall loves to read science fiction, advise startups, travel, and ski.Links: Caylent.com: https://caylent.com/ Twitter: https://twitter.com/jrhunt Riot Games Talk: https://youtu.be/oGK-ojM7ZMc James Hamilton Talk: https://youtu.be/uj7Ting6Ckk TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: It seems like there is a new security breach every day. Are you confident that an old SSH key or a shared admin account isn't going to come back and bite you? If not, check out Teleport. Teleport is the easiest, most secure way to access all of your infrastructure. The open source Teleport Access Plane consolidates everything you need for secure access to your Linux and Windows servers—and I assure you there is no third option there. Kubernetes clusters, databases, and internal applications like AWS Management Console, Yankins, GitLab, Grafana, Jupyter Notebooks, and more. Teleport's unique approach is not only more secure, it also improves developer productivity. To learn more visit: goteleport.com. And no, that is not me telling you to go away, it is: goteleport.com.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. About a year ago, from the time of this recording, I had Randall Hunt on this podcast and we had a great conversation. He worked elsewhere, did different things, and midway through the recording, there was a riot slash coup attempt at the US Capitol. Yeah.So, talking to Randall was the best thing that happened to me that day. And I'm hoping that this recording is a lot less eventful. Randall, thank you for joining me once again.Randall: It's great to see you, buddy. It's been a long time.Corey: It really has.Randall: Well, I guess we saw each other at re:Invent.Corey: We did, but that was—re:Invent as a separate, otherworldly place called Las Vegas. But since then, you've taken a new role. You are now the VP of Cloud Strategy and Solutions at Caylent. And the first reaction I had to that was, “What the hell is a Caylent? Let's find out.”So, I pulled up the website, and it was—you're an AWS partner, what I was able to figure out, but you didn't lead with that, which is a great thing because, “We're an AWS partner” is the least effective marketing strategy I can imagine. You are doing consulting on the implementation side the way that I would approach doing consulting implementation if I were down that path. Which I'm very much not, I'm pure advisory around one problem. But you talk about solutions, you talk about outcomes for your customers, you don't try to be all things to all people. You're Randall Hunt; you have a lot of options when it comes to what you do for careers. How did you wind up at Caylent?Randall: Well, you know, I was doing a startup for a little while, and unfortunately, you know, I lost some people in my family. And I was just, like, a little mentally burnt out, so I took a break. And I had already bought my re:Invent ticket and everything. So, then I was like, “Okay, well, I'll go to re:Invent; I'll see everybody and try and avoid getting Covid.” So, I was masked up the whole time.And while I was there, I ran into this group of folks who are on Caylent. And I did some research on them, and then we had some meetings. And I had already been kind of chatting with a bunch of different AWS partners, consulting partners, big and small. And none of them really stood out to me. I'm not trying to diss on any of these other partners because I think they're all pretty amazing in what they do, but then a lot of them are just kind of… the same shop, pushing out the same code. They don't have this operational excellence.Corey: Swap one partner for another in many cases, and there's not a lot of difference perceivable from the customer side of the story. And I know you're going to be shocked by this, but I'm not a huge fan of the way that AWS talks about these things, with their messaging. Imagine that? Like, and sure enough in the partner program, AWS continues what it does with services, and gives things bad names. In this case, it's a ‘competency.' If we used to work together and someone reaches out for a reference check, and I say, “Randall? Oh, yeah. He was competent.”Randall: [laugh].Corey: That has a lot of implications that aren't necessarily positive. It feels almost begrudging when they frame it that way. And it's just odd.Randall: The way that I look at it—so I don't know if you've ever been through this program, but in order to achieve those competencies, you have to demonstrate. So, in order to be able to list them on your little partner card, right, or in the marketplace or whatever, you have to be able to go and say, “These are five customers where we delivered.” And then AWS will go and talk to those customers and ask for your satisfaction scores from those customers. You have to explain which services you use, what the initial set was, and then what the outcome was. And so it's a big matrix that they make you fill out to accomplish each of these, and you have to have real-world customer examples.So, I like that there's that verification for people to know, but I don't think that AWS does a great job of explaining what that means. Like, what goes into getting a competency. And I don't know how to explain it quickly.Corey: Same here. When I look at those partner cards on various websites—in some cases above the fold on the landing page—they list out all the different competencies, and it's, on some level, if I know what all of those things are, and what they imply, and how that works, for a lot of problems I don't need a partner at that point because at that point I'm deep enough in the weeds to do a lot of it myself. To be clear, I have the exact opposite outlier type that most companies probably should not emulate. One of our marketing approaches here at The Duckbill Group has been we are not AWS partners, as a selling point of all things. We're not partnering with any company in the space, just due to real or perceived conflicts of interest.We also do one very specific, very expensive problem in an advisory sense, and that is it. If we were doing implementation, and we lead with, “Oh, yeah. We're not AWS partners,” it doesn't go so well. I once was talking to somebody wanted me to do a security assessment there, and, “All right, it's not what we do, but”—this was early days, and I gave the talk, and it turns out every talking point I've got for what works well in the costing space makes me look deranged when I'm talking about another space, it's like, “Oh, yeah. We're doing security stuff. Yeah, but we're no AWS partners, and we're not part of any vendor in this space.”“That sounds actively dangerous and harmful. What the hell is the matter with you people?” Because security is a big space, and you need to work closely with cloud providers when doing security things there. The messaging doesn't [laugh] land quite the same way. That's why I don't do other kinds of consulting these days.Randall: Yeah. But to your question of what the hell is a Caylent?Corey: [laugh].Randall: So, Caylent's name is derivative of Caylus, which is a God from Roman mythology. And I think it's the root of the word celestial. But I just looked up the etymology, and I can't confirm that. But let's be real, you know—Corey: Well, hang on a second because we look at Athena, which is AWS's service named after the Greek Goddess of spending money on cloud services—Randall: [laugh].Corey: We have Kubernetes, which is the Greek God of cosplaying as a Google engineer. And I'm not a huge fan of either of those things, so why am I going to like Caylent any better?Randall: Oh, it's Roman, not Greek.Corey: Ah, that would do it.Randall: [laugh]. No, I—beyond meeting the team there, and then reading through some of their case studies and projects when I was at re:Invent, in my first day at the company, I just went around and I just spoke with as many of the engineers as I could. And I was blown away at some of the cool stuff that they're working on and some of the talent. And here's the thing is, Caylent was pretty small last year, you know? I think they were at 30 people sometime last year, and now they're—it's, you know, 400% growth, almost.And they've done some really, really cool important work during Covid. For companies like eMed. They've done some work for, you know, all of these other firms. But between you and me—let's get down to business, which is, you know I love space.Corey: Oh, yeah. To be clear, when we're talking about space, that can mean a bunch of different things. Like, “Honestly, don't be near me,” could be how I interpret that.Randall: You know how I love space, and rockets, and orbital mechanics, and satellites, and these sorts of things. And SciFi. And Caylent's whole branding scheme is around this little guy called the [Caylien 00:07:28]. It's our little mascot, a little alien dude, and it is kind of our whole branding persona. And everything else that we do is rockets. And we don't have onboarding, we have launch plans. That whole branding, it seems silly but—Corey: It's very evocative, the Roman mythology, I think that's a great direction to go in. I realized that for the start of this episode, I forgot to give folks who are not familiar with you a bit of backstory. You've done a lot of things: You worked at NASA, and then you were at MongoDB, and you were a boomerang at AWS—your second time there is where I wound up meeting you—in between, you decided to work at a little company called SpaceX. So, yeah, space is kind of a thing for you.And then you were in a few different roles at AWS, and that's where I encountered you. And you had a way of talking to people on stage, or in a variety of different contacts, and building up proofs of concept, where you made a lot of the technical hard things look easy without being condescending to anyone, in the event that the rest of us mere mortals found them a little trickier to do. You did a great job of not just talking about what the service did, but about what problem it solves, and thus by extension, why I should care. And it was really neat to watch you just break things down like that in a way that makes sense. Now that you're over at Caylent as the VP of Cloud Strategy, the two things I see are, on the strength side, you have an ability to articulate the why behind what customers, and companies, and technologists are doing.The caution I have, and I'm curious about how you're challenging that is, your default goto explain things in many cases is to write some code that demonstrates the thing that you're talking about. Great engineer; as a VP, depending on how that expresses itself, that could be something that poses a bit of a challenge. How do you view it?Randall: You gave me some good advice on this. I don't know if you remember, but you said, “Randall, if you're in management, you got to make sure you're not just an engineer with an inflated title.” You know, “You have to lead. Good leaders aren't passive; they're active.” And I kind of took that to heart.I'm never going to stop coding, I'm never going to be hands-off keyboard, but one of the things that I've been focusing on lately, as opposed to doing pure implementation, is what is the Caylent culture and the Caylent way of doing things, and how can we onboard junior talent and get them to learn as much as they can about the cloud so we can cover the cost of their certification and things like that, but how do we make it so we're not just teaching them the things they need to be successful in the role, but the things they need to be successful in their career, even as they leaves Caylent. You know, even beyond Caylent. When you're hiring somebody, when you're evaluating a cloud engineer, if they have Caylent on their resume, I want that to be a very strong signal for hiring managers where they're like, “Oh, I know, Caylent does amazing work, so we're going to definitely put this person in for an interview.” And then I've been an independent consultant many, many times. So, I've done work just off on the side, like, implementation and stuff for probably hundreds of companies over the last decade-plus, but what I haven't done is really worked with a consulting firm before.I have this interesting dilemma that I'm trying to evaluate right now, which is, you work with a very broad set of customers who have a very broad set of values and principles and ways of doing things. And you, as a consultant, are not able to just prescriptively come in and say, “This is how you should do it.” You know, we're not McKinsey; we don't come in and talk to the board and say, “You have to restructure the whole company.” That's not what we do. What we do is we build things and we help with DevOps.And so I've been playing around with this, so let me workshop it on you and you tell me what you think. It's—Corey: Hit me.Randall: At Caylent, we work within the customer's values, but we strive to be ambassadors of our Caylent culture. “Always be on the lookout for values, ideas, tools, and practices that our customers have that would work well here at Caylent. And these are our principles unless you know better ones.” I don't know if you know that phrase, by the way. It's an old Amazon thing.Corey: Oh, yeah. I remember that quite a bit. It's included in most of their tenet descriptions of, “These are ours unless you know better ones.” They don't say that about the leadership principle because—Randall: Right.Corey: —it's like, “These are leadership principles unless you know better ones.” Yes, several. But that's beside the point. The idea of being able to—being about to always learn and the rest. You also hit on something that applies to my entire philosophy of employment.Something we do in this industry is we tend to stay in jobs for, I don't know, ideally, two to five years in most cases, and then we move on. But magically, during the interview process, we all pretend that this is your forever job, and suddenly, this is the place that's going to change all of it, and you're going to be here for 25 years and retire with a gold pocket watch and a pension. And most people don't have either of those things in this century, so it's a little bit of an unrealistic fantasy. Something I like to ask our candidates during the interview process is always, “Great. Ignore this job. Ignore it entirely. What's the job after this one? Where are you going?”Because if you don't plan these things, your career becomes what happens to you instead. And even if what you plan changes, that's great. It keeps you moving, from doing the same thing year after year after year after year. Early in my career, I worked with someone had been at the company for seven years, but it was time for him to go and he couldn't for the life and remember what he did years two through four, which—Randall: Yeah.Corey: —you may as well not have been there.Randall: There's a really good quote from the CEO of GitLab that… says, “At GitLab, we hire people on trajectory, not on pedigree.” And I love that. And—you know, I never finished college, so the fact that I've been able to get the opportunities I've been able to get without a college degree, and without a fancy name on my resume—Corey: We are exactly the same on that, but hang on a second; you have a lot of fancy names on your resume, so slow your roll there, Speed Racer.Randall: Okay. Okay, well, [laugh] but that's after, right? Like, I think once you land one, the rest don't matter. But I—Corey: I still never have. The most impressive thing on my resume is, honestly, The Duckbill Group.Randall: Well, I think that's pretty impressive now, right?Corey: Oh, it is—Randall: [laugh].Corey: —we're pretty good at what we do. But it doesn't have the household recognition that you know, SpaceX does. Yet.Randall: Yet. [laugh]. I'm really loving building things and working with customers, but you're totally right. As you move into leadership, it's not your job to write code day in and day out. I know a couple people. So, Elliot Horowitz, who used to be the CTO over at MongoDB, he would still code all the time. And I'd love to be able to find a way to keep my hands-on keyboard skills sharp, but continue to have the larger impact that you can have in leadership for a larger number of people.Corey: I have the same problem because my consulting clients, it's pure advisory. I don't write production code for a variety of excellent reasons, including that I'm bad at it. And with managing the team here, as soon as I step in and start writing the code myself, in front of—instead of someone else whose core function it is, well, that causes a bunch of problems culturally as well as the problem of I'm suddenly in the critical path, and there's probably something more impactful I could be and should be working on.So, my answer, in all seriousness, has been shitposting. When I build ridiculous things that—you helped out architecturally with one of them: The stop.lying.cloud status page replacement for AWS.Randall: Oh, yeah, that you were regenerating every time? I remember that.Corey: Yeah. I wrote a whole blog post about that. Like, I have a Twitter client that I wrote the first version of, and then paid someone to make better: lasttweetinaws.com, that's out there for a bunch of things.My production pipeline for the newsletter. And the reason I build a lot of these things myself is that it keeps me touching the technology so I don't become a talking head. But if I decide I don't want to touch code this week, nothing is not happening for the business as a direct result of that. Plus, you know, it's nice to have a small-scale environment that I can take screenshots of without worrying about it. And oh, heavens, I'm suddenly sharing data that shouldn't be shared publicly. So, I find a way to still bring it in and tie it in without it being the core function of my role. That may help. It may not.Randall: No, it does help. There's this person in our industry, Charity Majors. I've been reading some of her blog posts about engineering management and how that all kind of shakes out, and I've tried to take as much lessons from that as I can. Because, right, you know, being in leadership is fairly new for me, I don't know if I'm good at this, I might suck at it. And by the way, if Cayliens are listening and you see me screw up, just shoot me a message on Slack, anytime, day or night. It's like, “Hey, Randall, you screwed this up.” Just let me know because—Corey: Or call it out on Twitter; that's more entertaining. I kid. I kid. That's what's known as a career-limiting move in most places. Not because Randall's going to take any objection to it, but because it's—people can see the things that you write, and it's one of those, “Oh, you're just going to call down your own internal company leadership in public?” Even if it's a gag or something people don't have the context on that. It does not look good to folks who lack the context. I've learned as I've iterated forward that appearances count for an awful lot on things like that. I'm sorry, please continue.Randall: And the other thing that I've discovered is that you can have an outsized impact by focusing on education within your own company. So, one of my primary functions is to just stay on top of AWS news. So—Corey: Yeah. Me too.Randall: Exactly, right? So, literally every RSS feed from AWS, I watched every single re:Invent video. So it's, like, 19 days' worth of video. And obviously, you know, I put it on double-speed, and I would skip through a bunch of things. But I go, and I review everything, and I try and create context with the people who are moving and shaking things at AWS and building cool stuff.And my realization is that I need to work to grow my network and connect with people who have accomplished very impressive things in business. And by leveraging that network and learning about the challenges they faced, it becomes a compression algorithm for experience. And I know that's an uncommon, unpopular opinion, that most people will say there is no compression algorithm for experience, but I think taking lessons learned and leveraging them within your own organization is probably one of the most important things you can do.Corey: I would agree with you, but I also going to take it a step further. “There's no compression algorithm for experience.” It sounds pithy, but it's one of the most moronic things I've heard in recent memory because of course there is. We all stand—Randall: It's called machine learning. [laugh].Corey: —on the shoulders of giants. We can hire consultancies, you can hire staff who have solved similar problems before, you can buy a product that bakes all of that experience into it. And, yeah, you can absolutely find ways of compressing experience. I feel like anytime a big cloud company that charges per gigabyte tells you that there's no compression algorithm for anything, it's because, “Ah, I see what's going on here. You're trying to basically gouge customers. Got it.”Randall: I want to come back to that in one second, right, because I do want to talk about cloud networking because I have so many thoughts on this, and AWS did some cool stuff. But there's one other thing that I've been thinking about a lot lately, and one of the hardest things that I found in business is to not slow down as your organization grows. It becomes really easy to introduce excuses for going slower or to introduce processes that create bottlenecks. And my whole focus right now is—Caylent's in this hyper-growth period: We're hiring a lot, we're growing a lot, we have so many inbound customers that we want to be able to build cool stuff for. And help them out with their DevOps culture, and help them get moved into the 21st century, right?How do we grow without just completely becoming bureaucratic, you know? I want people to be a manager of one and be able to be autonomous and feel empowered to go and do things on behalf of customers, but you also have to focus on security and compliance and the checkboxes that your customers want you to have and that your customers need to be able to trust you. And so I'm really looking for good ideas on how to, like, not slow down as we grow.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: That's always an interesting challenge because slowing down is an inherent… side effect of maturity, on some level, and people look, “Well, look at AWS. They do all kinds of super quickly.” Yeah, they release new things from small teams very quickly, but look at the pace of change that comes to foundational services like SQS or S3, like the things that are foundational to all of that? And yeah, you don't want to iterate on that super quickly and change constantly because people depend on the behaviors on the, in some cases, the bugs, and any change you make is going to disrupt someone's workflow. So, there's always a bit of a balance there.I want to talk specifically about how you view AWS because people ask me the same thing all the time, and you stand in a somewhat similar position. You worked there, I never have, but you have been critical of things that AWS has done, rightfully so. I very rarely find myself disagreeing with you. You're also a huge fan of things that they do, which I am as well. And I want to be very clear for anyone who questions this, you work for a large partner now, and there are always going to be constraints, real or imagined, around what you can say about a company with whom a good portion of your business flows through.But I have never once known you to shill for something you don't believe in. I think your position on this is the same as mine, which is—Randall: A hundred percent.Corey: I don't need to say every thought that flits through my head about something, but I will not lie to my audience—or to other people, or my customers, or anyone else for that matter—about something, regardless of what people want me to do. I've turned down sponsorships on that basis. You can buy my attention, but not my opinion, and I've always got a very strong sense of that same behavior from you.Randall: You're totally right there. I mean—Corey: [unintelligible 00:21:39] disagree with that. Like, “No, no, I'm a hell of a shill. What are you—thanks for not seeing it though.” Come on, of course you're going to agree with that.Randall: So, when I was at AWS, I did have to shill a little bit because they have some pretty intense PR guidelines. But—Corey: Rule number one: Never say anything at any time proactively. But okay. Please continue.Randall: No, no, I think they've relaxed it over the years. Because—so Amazon had very strict PR, and then when AWS was kind of coming up, like, a lot of those PR rules were kind of copy-pasted into AWS. And it took a while for the culture of AWS, which is very much engineering-focused, to filter up into PR. So, I think modern-day AWS PR is actually a lot more relaxed than it was, say in, like, 2014. And that's how we have Senior Principal and distinguished engineers on Twitter who are able to share really cool details about services with us.And I love that. You know, Colm's threads are great to read. And then, you know, there are a bunch of people that I follow, who all have cool details and deep-dives into things, Matt Wilson as well. And so when you talk about being authentic and not just reiterating the information that comes from AWS, I have this balance that I have to play that I was honestly not good at earlier on in my career—maybe it was just a maturity thing—where I would say every thought that came through my head. I wouldn't take a beat and think about, you know, how can I say this in a way that's actionable for the team, as opposed to just pure criticism?And now, I am fully committed to being as authentic as possible. So, when a service stinks, I say it. I am very much down on Timestream right now. For what it's worth, I have not tried it this month, but you know, I keep trying to use Timestream, right, and I keep running into issues. That these lifecycle policies, they don't actually move things in the timely manner that you expect them to.And, you know, there's this idea that AWS has around purpose-built databases and they're trying to shove all of these different workloads into different databases, but a lot of times—you know, DynamoDB can be your core data processing engine, and everything else can flow from that. Or you can even use MongoDB. But throwing in Timestream and MemoryDB and all these other things on top of it, it becomes less and less differentiated. And a lot of these workloads are getting served by other native services, like cloud-native services.And anyway, that's a whole tangent, but basically, I wanted to say, you can expect me to continue to be very opinionated about AWS services, and I think that's one of the reasons that customers want us there is we will advise you on the full spectrum of compute, right? We're not going to say, “Oh, you have to go serverless.” There are still some workloads that are not well served by serverless. There's still some stuff that just doesn't work well with serverless. And then there'll be other workloads where EKS is where you want to be running things, you know? Maybe you do need Kubernetes.I used to go on Twitter all the time, and I would say things like, “You don't need Kubernetes. You don't need Kubernetes. Like, you only need Kubernetes at this scale. Like, you're not there yet. Calm down.” That's changed. So, these days, I think Kubernetes is way easier to deal with and it's a lot more mature, so I don't shy away from recommending Kubernetes these days.Corey: What is your take on, I guess, some of the more interesting global infrastructure stuff that they're doing lately because I've been having some challenges, on some level, building some multi-region stuff, and increasingly, it's felt to me like a lot of the region expansions and the rest have been for very specific folks, in very specific places, with very specific—often regulatory—constraints. These aren't designed to the point where anyone would want to use more than two or three in any applications deployment. And I know this because when I try to do it, the [SAs 00:25:13] look at me like I'm something of a loon.Randall: So, there were two really cool launches from AWS, this year at re—or last year at re:Invent. There was Cloud WAN or Cloud Wide Area Network, and there was SiteLink, and there was also VPC Access Analyzer. But when we talk about AWS's global infrastructure, I like going back to James Hamilton's talk. I don't remember if it's 2017 or 2018, but it was, “Tuesday Night Live” with AWS or something, and it walked through what a region is. And so the AWS Cloud these days is 26 regions, there are eight more on the way, and then there's something like 30 local zones.And I think that AWS is focused on getting closer to their customers, creating better peering relationships with different telecom providers, creating more edge locations, creating more regional caches, is transformative for what can be delivered. I play video games, so Riot Games gave a cool talk at re:Invent about how they use a mix of Outposts, and edge locations, and local zones to be able to get their Valorant gamers on to—Valorant is this first-person shooter game—get those gamers on the most local server that minimizes latency and pain for them. And that's the kind of future that I want to see us build towards, and that's something—I'm still incredibly bullish on AWS. I know Azure and Google are making improvements, and great for them for doing that because it raises all of us up to compete, but the thing that AWS has done that separates them from a lot of the other clouds is they have enabled workloads that literally would not have been possible without fundamental investment in global infrastructure. I'm talking things like undersea cables, I'm talking things like net-new applied photonics for fiber: There's researchers at AWS whose sole job is to figure out how to fit more stuff into fiber.So, James Hamilton did this talk, right, and he broke down what an availability zone is—and there are 84 availability zones in all now—and he walked through an availability zone is not a single data center; an availability zone typically comprises multiple data centers that are separated from each other with different infrastructure and stuff. And then he broke down, like, the largest AWS availability zone is 14 data centers. And all new regions, by the way, have three availability zones, and those availability zones, they're meaningfully separated, more than a mile but less than an issue than when, like, speed of light effects come in. And that's where you can build services like Aurora, where you have this shared storage layer on top of a data engine. And that's how you can build FSx for Lustre, and EBS, and EFS.And, like, all of these services are things that are really only possible at scale. And Peter DeSantis talked a lot about this in his keynote, by the way, about the advantages of aggregate workload monitoring. I think AWS's ability to innovate from first principles is probably unparalleled in our global economy right now. That's not to say they will always be there, and that's not to say that they're always going to be that level of innovation, but for the last ten years, they've shown again and again that they can just go gangbusters and release new stuff. I mean, we have 400-gigabit-per-second networking now. Like, what the heck?Corey: And we still charge two cents per gigabyte when we throw that amount of capacity from one availability zone in a region to another. Which, of course I'm still salty about. Remember, my role is economics, so I have a different perspective on these things.Randall: Well, I like that Cloudflare and Google and Microsoft—and even Oracle, by the way; I don't know—at some point we should talk about Oracle Cloud because I used to be really down on them, but now that I've played around with it more, they're like coming up, you know? They're getting better and better.Corey: I am very impressed by a lot of stuff that Oracle Cloud is doing. With the disclaimer that they periodically sponsor this podcast. I think they're still doing that. That's the fun thing is that I have an editorial firewall. But I'm not saying this because they're paying me to say this; I'm saying it because I experimented with it.I was really looking forward to just crapping all over it. And it was good. And… “Who is this really? Like, did someone just slapping Oracle sticker on something pleas”—no. It's actually nice. But yeah, we should dive into that at some point.Randall: I want to say one more thing on global infrastructure, and I know we don't have a lot of time left, but even 800-gigabit-per-second networking on the Trainium instances, by the way now. Which is just mind-blowing.So, the fact that AWS has redone two-inch conduits—and I have this picture that I took at re:Invent that I can share with you later, if you want—of all their different fiber and, like, networking and switches and stuff. In aggregate, one of their regions has 5000 terabits of capacity. 5000 terabits. It's 388 unique fiber paths. It's just—it's absolutely fascinating, and it's a scale that enables the modern economy and the modern world.Like the app we're using to record this podcast, all of these things rely on AWS global infrastructure backbone, and that's why I think they charge what they charge for, you know, these networking services. They're recouping the cost of that fundamental investment. But now, last year they announced 100 gigabytes free for S3 and non-CloudFront services, and then one terabyte per month for free from CloudFront. So, that's a huge improvement. It's a little late, but I mean, they got it done.Corey: I do want to the point of transparency and honesty, the app that we're using to record this does, in fact, use Google Cloud. But again—Randall: Oh.Corey: —it's—yeah, again, it's one of the big ones, regardless. You can always tell which one is it, and not, “No, I'm running this myself on a Raspberry Pi.” Yeah. There's a lot that goes into these things. Honestly, I think the big winners in all this are those of us who are building things on top of these technologies—Randall: Yes.Corey: —because I can just build the ridiculous thing I want to and deploy it worldwide without signing $20 million of contracts first.Randall: Yeah. And going back to your point about multi-region stuff, I think that's getting better and better over time. There's some missteps. So like, let's take DynamoDB global tables, for instance—Corey: Which is not in every region, so it's basically this point, hemispherical tables.Randall: Well, even so, it's good enough, right? Like, it gives you the controls that you need to be able to slide that shared responsibility model and that shared cost model in the way that you need to. Or shared availability model. What is frustrating though, is that while this global availability is getting better and better from a software perspective, it's getting harder and harder from a code perspective. So, actually writing the code to take advantage of some of this global infrastructure is imperfect. And Forrest Brazeal, from Google Cloud, he spoke a little bit about this recently, and we had a cool Twitter discussion.Corey: Fantastic. I'm a big fan of Forrest. I'm glad that he found a place to land. I'm sad that it's not in the AWS ecosystem, but here we are.Randall: I mean, I'll follow that man anywhere. He's the Tom [Lehrer 00:31:48] of cloud. Just glad he's still around to keep making some cool stuff.Corey: I don't want to know what I am of cloud, ever. Don't tell me. Talk about it amongst yourselves, but don't tell me. Randall, I want to thank you for taking the time to speak with me. It is always a pleasure. If people want to learn more about what you're up to, where can they find you these days?Randall: caylent.com. I'm going to be writing a bunch of AWS blog posts on there, so go there. Also go to Twitter, @jrhunt on Twitter.And if you need help building your cloud-native apps and some DevOps consulting, or just a general 30-minute phone call to understand what you should do, reach out to me; reach out to Caylent. We're happy to help. We love taking these conversations and learning what you're building.Corey: And we will, of course, put links to that in the [show notes 00:32:30]. Thank you so much for taking the time to speak with me. I appreciate it.Randall: Thank you for having me on. It's great to see you.Corey: Until the next time. Randall Hunt, VP of Cloud Strategy and Solutions at Caylent. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice and an angry comment complaining about the differences between Greek and Roman mythology, and the best mythology is the stuff you have on your website about how easy it is to use your company, which is called Corporate Mythology.Randall: I love it.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About WayneProfessionally, I'm a Vice President at Amazon Web Services (AWS) where I lead a set of businesses delivering cloud infrastructure services. In 2013, I founded and continue to lead the AWS Boston regional development center. I'm an always-curious entrepreneur who is passionate about building innovative teams and businesses that deliver highly disruptive value to customers. I love engaging people who build and deliver customer-obsessed solutions, as well as customers wanting to realize value from those solutions. I hold over 40 patents in distributed and highly-available computer systems, digital video processing, and file systems. Personally, I'm a proud dad to great people, I love to cook and grow things, it relaxes and grounds me, and I cherish finding adventure in the ordinary as well as the extraordinary.Links: LinkedIn: https://www.linkedin.com/in/wayneduso/ Twitter: https://twitter.com/wayneduso TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Way back in the winter of 2017, I went to my first re:Invent, and at that re:Invent, or shortly before, they announced EFS, Elastic File System, which is basically a NetApp in the cloud, killing some of my stuffing a NetApp into us-east-1 jokes. And my initial, considered, reaction—because I'd been writing the newsletter for about six months at that point—was, “What a piece of crap.” And the general manager of the product said, “Hey, can we meet at re:Invent?”And showed up with, as I recall, a couple of engineers who had no neck, and I thought, “Oh, great, this is how I die.” Instead of what I expected, what happened was he asked a bunch of questions and took notes. And one by one, every issue that I had with a service—and it was lengthy—wound up getting knocked out in the next couple of years. And today, it's one of my absolute favorite services and I use it daily. That was an early but lasting impression of how a lot of my interactions with AWS were to go. And I'm very glad today to have that former GM and now VP of Engineering at AWS, Wayne Duso here to suffer more of my slings and arrows. Wayne, thank you for joining me.Wayne: Corey, it's always a pleasure. Thank you.Corey: It really was a transformative moment for me, just because I was still finding my own voice in this. Because my big fear then was, “No one's going to read it; no one's going to listen. I'm going to starve to death and have to get a real job and get fired some more.” And it was just about get out there at any cost. I didn't have the reach that I did, then, and I was a lot more cynical, and in some ways, directly opposed to service launches.I try not to do that anymore; don't always succeed. But I never got the sense that you took any of that feedback personally. And in some cases, you set me straight when I was wrong on things, and in others, you're not only listened, you agreed with me and took steps to make it better. This is not me shining you on. This is very much the course of EFS.Wayne: Yeah, Corey, you know, your feedback was super important. It was early days, and we don't pretend that we get everything right the first time. You know, we don't. You write about how we don't get it right the first time quite often. And it's appreciated because, you know, as engineers, as technologists, as leaders, you're always looking for input.Input is one of the most valuable things we can get. You know, not often people don't provide us input. And when I met you and you had your long scroll of input for us, for me it was a treasure trove. But it really was a treasure trove. And to this day, whether it's you or, you know, the next you, those scrolls are still treasure troves because it means that somebody has a different view than what I currently have.Corey: It stuck with me because suddenly it came crashing home to me that a lot of the criticisms that I was lobbying against AWS weren't just me shouting into the void. They were being heard. And also, it really was one of those reaffirmations back then; it's like, “Oh, yeah, making fun of Amazon Web Services; they're a division of a company that's however many trillions of dollars it was at the time,” and, “That's not a person. There are no people there. It's just a big company.”And the dawning, creeping realization was that companies are made of people. And instead of just saying, “This is crap.” I've got to be able to articulate why that is. And I guess the one enduring criticism that I had about EFS that still holds true on some level, is—it has nothing to do with the capabilities of the service, but more to do with the pattern of if I'm building something to work in the cloud on day one using what is, effectively, NFS that attaches to a bunch of different instances simultaneously—or now containers or Lambdas Function—great, that feels like it's not the direction that cloud architecture has historically gone, but now that the capability is there, that's starting to shift, and its own right once again. So, it's just… it was a service that I didn't fully understand then—I'm not sure that I still do—but I definitely see the use of it and the utility of it. And by just sitting here and doing effectively nothing, it gets better with time. And if there's a better story about cloud, I'm not sure what it is.Wayne: You make it sound like a wine. If you, you know, if you do a proper job mixing those grapes, eventually you put in a bottle, it gets better and better as time goes on, but.Corey: And eventually becomes vinegar, and we can list, like, three or four of those services, too, but let's be charitable here.Wayne: [laugh]. We're not going to go there. So, thank you for popping the cork on this bottle of wine and drinking it on a regular basis. Files are everywhere. You know, when I came to the company in 2012, and I was handed a question, “How should we build? What should we build?”—in terms of a file offering—I asked why in the same way that I'm sure you asked why when we launched. And the truth of the matter is that doesn't matter what type of operating system you're on or what you're doing files are a fundamental abstraction.And every operating system—every language—has an interface, which is file-based. It's a really good abstraction. And you know, objects are great abstraction blocks are great abstraction, like, they all exist for a reason. They all serve a purpose. If they didn't, they would have gone away.And files have, you know, been around for at least 50 years, and honestly, they'll be around for at least another 50, and probably beyond. So, it was really important to build a capability that customers—builders—could use straight out of the tools that they use every day without having to go through any transformation. It was important.Corey: When I say that EFS is a contender out of maybe four others for my favorite service, that people think that it's because I love files or I love storage, and okay, fine, but that's not the real reason; that's sort of irrelevant. I mean, I like SSL certificates too, but ACM isn't on that list either. What I found valuable about it, what I love about it as an exemplar service is that if I go through the console wizard to spin up an EC2 instance—which of course I do because the ultimate form of managing cloud resources is using the console and then lying about it; it's called ClickOps—and as I go through that process, adding an EFS file system to it is built into that wizard and it just works. When you spin up an EFS volume, it automatically configures AWS Backup to begin taking backup snapshots of what's going on in there.And there's a bunch of niceties built into that. Recently, at re:Invent, there were additional changes rolled out—or pre-re:Invent—rolled out to EFS that enable automatic data tiering where, “Oh, that file hasn't been touched in a while; we're going to move that to the infrequent access tier and charge you less for it.” And this all happens transparently in the background. It is a service that gets better without requiring active customer interaction, and you're sort of swimming against the stream of the typical AWS service approach by talking to other service teams and integrating into them in a way that is transparent to the customer. And I'm looking at this and I'm tapping the screen going, “More like this, please.” How did you get there?Wayne: I'm a lucky guy in many ways. You know, I use this term inside of teams on a regular basis, which is, “We stand on the shoulders of giants.” And EFS was not the first service to launch at AWS; we know that… it wasn't SQS, it was S3. And—Corey: We're talking beta or general availability—Wayne: [laugh].Corey: Those are the two teams that will wind up fighting to the death, and I just stay out of it.Wayne: I know. I just wanted to beat you on that one. And so many great services, whether it's Dynamo or S3 or EBS, came before the services that I built for customers—or my teams have built for customers. And so I get to stand on their shoulders and look far out onto the horizon; and also, I get to look backwards at mistakes or issues that we may have made, you know, earlier. And if I make the same mistakes, shame on me. I should be able to ensure that what we produce takes from the lessons that are already been learned.So, for EFS as an example, I make sure that I look at all the lessons that came from EBS or S3 or Dynamo in terms of their scale and their usability and so on and so forth and say, “How do we make sure that we're as good if not better for our customers?” And at some point, somebody will stand on the shoulders of EFS and they'll look forward and say, “Oh, we got to do better than what they did in their first couple of years.” And I look forward to that.Corey: I want to be clear that your portfolio has expanded significantly. It turns out that you were the GM and then now you're a VP of Engineering. And so what changed? “Oh, same job, just different title.” And that is very much not true.You own a bunch of other services as well—largely storage-oriented—including the snow family, which means that you are the person to, basically, harangue into—that I get to a point being able to check that item off my lifelong bucket list of beeping the horn on a snowmobile. It's going to happen someday; I just don't know how or when, but I feel like you're someone who can help me set that up.Wayne: But I live in a part of the country where we need snowmobiles, so it behooves me. [laugh].Corey: Oh, yes. Growing up in New England, I remember those days, too. It's, “Okay. Don't plow the street. That's fine. I'll just cross-country ski to school.”Wayne: I have stories but we won't go there.Corey: Oh, yes.Wayne: You know, Corey, in my tenure, which is now a roughly a little over nine years at AWS, I've had the opportunity to meet with a lot of enterprise customers, I have a very rich enterprise background base on my career, so it was very natural for me to engage cohorts of customers, storage administrators, IT administrators, application administrators, network administrators, all of which have a rich history in success in the enterprise. And in speaking with them, it became incredibly clear that there were a series of enterprise-based services that needed to be built for cloud-scale, the cloud model of consumption, that just didn't exist. And I'm not a big fan for creating services for the sake of creating services, just like you are not a big fan of us doing that, so I wanted to make sure that everything we built was filling a need that could not be filled any other way. And in that nine years, my teams and I have built roughly ten services covering storage, edge compute, edge storage, and data services like data protection, data movement, data management. And each of these services has deep roots in those enterprise cohorts.Corey: One thing that's become obvious to anyone who pays attention in the space has been that when we look at the sweep of history when it comes to technology, it's that the tide always rises. When I first started playing around with technology, firewall engineer was a specific job. Now, any network administrators expected to be able to handle that just in due course. And when you're talking about storage, the same thing happens. Now, there have been people who spent 30 years of their career as storage admins or working on specific technologies, and now that cloud is disrupting so many aspects of this, there's a tendency that technologists have to push back against anything that disrupts the thing that they work on because they equate their identity to the thing that they build.And let me be very clear here. I don't think most people are immune to this. I know I'm not. It's one of the reasons I tend to get so irritated about things that are disruptors to the way I thought about doing things. Why do you think I hate containers so much? It's because well, that's something that sounds like a different paradigm, and I'm not good at that, but I'm very good at this older thing.And letting go, like, I've done that a few times and changed my focus throughout my career, but it's always been challenging to do it. When I tell the story. “Oh, yes. I saw this thing and then I pivoted.” “Yeah, it wasn't that easy.” There was angst and pain tied to it, and the constant awareness that I'm going to become a dinosaur if I don't change my area of focus—and by the way, I'm 26 years old at the time—it was a hard thing to do.Storage is such a important thing for so many companies in so many environments, that it's such a nuanced deep area, that there is significant disruption happening there. How have you found those conversations to go with the folks at your customers who probably identify themselves as, “I hate the cloud. We should build more data centers.” Which is not actually what they're saying, but it's how it's expressed.Wayne: Yeah. Evolutionary change is something that the folks that you're talking about understand, they embrace evolutionary change; they're curious people, they love to learn; they're passionate about what they do, and they're passionate about their customers. It's the revolutionary change that is hard. And they often view the cloud consumption model versus the traditional IT consumption model of receiving equipment, setting it up, putting on the floor, so on and so forth, they see that as an evolution, where they saw cloud as a revolution. And they don't want to necessarily embrace such a big change.It's part of my job and the job of my peers to have those folks, the storage administrators, the application administrators understand that it's not as revolutionary as it may seem. And I want to give you an example of how we've done that. We've talked at length about EFS; EFS has a sibling and it's known as FSx. And that really stands for File System X, which is any file system. What does that really mean?Corey: I figured that one out a—what was it? It was something like three years after FSx launched, two years, something like that—I thought FSx was some storage term I'd never heard before. And nope, I was over-complicating it. It turns out—surprise—I don't know everything either.Wayne: Yeah, well, we were wrestling really hard with what we should name FSx. Look, I'll tell you that story in a second, but what FSx is, is it's a sibling service to EFS. EFS was built to be a cloud-native set-and-forget super-simple file system on AWS. Now, with that, there are 30 years of features that we did not implement when we launched EFS because the vast majority of those aren't needed by everyone, so we didn't complicate the solution.That being said, if you think about storage administrators and application administrators in the enterprise, today, they use very specific file systems and the characteristics of those file systems, whether it's performance, or durability, or—more importantly—management APIs, Snap APIs, replication APIs, all sorts of various data management capabilities, data service capabilities. FSx is a service that was designed to bring those file systems to AWS as fully managed offerings so they could be consumed in a cloud-native fashion. You've seen the launch of four of those to date. The first one we launched was FSs for Windows Server so that customers that used Windows servers on-prem could lift and shift their applications their workloads to a like offering on AWS. It's bit-compatible.The second was Lustre, we talked to a whole bunch of customers, amazing use cases, you know, FMI that is helping have cancer treatments that are specific to individual patients, they needed to be able to run high-performance workloads on AWS. Lustre was a great solution for them, but everybody knew that Lustre was a little hard to run on-prem. It took a lot of energy, took people to keep it up and running. But we took all of that away from them and provided them a fully managed Lustre offering, which they just create the file system, load the data into the file system, and go, and they worry about nothing after that.Those were the first two. With the success of those, we heard customers coming to us—same customers say, “Hey, listen. I got a floor full of NetApps. I would love to run my NetApp-based workloads and applications on AWS. Can you help us?”And we partnered with NetApp—it was a very deep partnership for two years—to create a bit-compatible like-for-like, no excuses, NetApp offering on FSx. And then we quickly followed it a couple months later, with a ZFS offering, which is FSx for OpenZFS because for the folks that aren't running NetApp, there's a whole bunch of on-pram that are running ZFS-based file systems, and they rely on the data management APIs of that file system. So, you know, for this cohort, we took what seemed to be a revolutionary change, and we turned it into an evolutionary change. And now our job is to go speak to all of those folks and have them understand that they can make that switch without losing the 20 years, the 30 years of expertise, without losing the trust of their customers, the folks that are running on their storage because they can guarantee them that what they have today is what they will have when they move. Super important.Corey: I think that it does come down to trust that it does come down as well to the challenge that you're in when it comes to storage. As we look at the broad sweep of where cloud business is coming from. It's easy to sit here and pretend that we're all somehow—we're all doing net new; we're building out this thing in a garage somewhere. “I have an idea.” “What is it?” “Twitter for Pets.” “Is that a good idea?” “Absolutely not, but I just raised 200 million in seed funding, so we're going to do it anyway.”A lot of this stuff is existing legacy workloads. And legacy is, of course, a condescending engineering term for, “It makes money.” But that is what you have to be able to address because to move your workload to cloud is a heavy lift. It becomes borderline impossible with oh and to do that, you're going to have to completely re-architect the entire data layer, because that thing you're doing right now, not so much. On tap in the cloud as an FSx NetApp offering was transformative for me.Because back when I was in data center land, NetApp was the only way I would willingly run NFS in production. And I ran production MySQL databases on top of it. Professional advice, if you're listening to this elsewhere, don't do that. But it can be done, and it was amazing. And WAFL remains one of my favorite file systems ever.And I kept joking that I just wish I could get it in AWS, but they won't let me shove a filer into us-east-1. I know because I've asked. Well, you went ahead and did that for me. So thanks.Wayne: You're welcome. It's our pleasure. [laugh].Corey: I really hope that is still relevant to places I worked back in 2007, but let's face it, “This is a temporary fix,” and 20 years later, it's still load-bearing in production. So, here we are.Wayne: You know, you'll never hear me use the word ‘legacy.' And in fact, within the company, we'll be in rooms and people use the word legacy—it's a very popular word people love to use it—and what I often will tell them is that if an application workload is important to a customer and it is helping run their business. There is nothing legacy about it. It is current, it is real, and we need to think about it in the way they think about it, not in a way, which has us in any way deprecate its importance, the customers will deprecate its importance if that's important to them, so our job is to embrace what they have and help them, if they so choose, to bring those workloads to AWS without change. I'm going to give you an example. I'm not sure I can use the customer's name because I often lose track of who I can talk about in public and who I can't. So, I [crosstalk 00:20:32]—Corey: I long ago stopped paying attention to the details of NDAs, which sounds like a terrifying statement to make until you realize the way I do that is I just assume every conversation I have is confidential until I can affirmatively prove otherwise. And I'll ask people sometimes like, “Hey, remember the thing you said to me? Can I quote that in public?” And they look at me strangely, and say, “It was on a podcast and we put it out to the world. Yes. Yes you can.” Oh, good.Wayne: [laugh]. Well, in this particular case, I can tell you the country and I ca—and actually continent in this case. It was a customer down in Australia, and they ran into a situation where they needed to move out of their data centers, and they needed a place to go and AWS was already provided to them. And they moved 40 Windows-based applications to AWS over a weekend. I feel bad for the folks; hope they get a few weeks off—at least a few days off after that event.But they were able to move almost their entire business to AWS and FSx for windows over a weekend because the experience simply was, create the file system, spin up the application, mount the file system and go. And it worked exactly as it had on-prem. Exactly as it had on-prem. When I hear stories like that, as a builder, as an engineer, as a human, I am incredibly happy. It is a day worth living when you know that you've helped a customer.Corey: When you come in and look at an architecture like that, see it for the first time. And you look at like, “Hey, you're using the cloud like it's a data center. What's the deal here?” And if you say that in a condescending, insulting way, they're sitting around talking about what an amazing achievement something like that is—and let's be clear; having done those projects, it's an amazing achievement—but to have someone come in with no context, “Oh, you should have done this in a more cloud-native way.” It's, “Thank you, Seymour. Yes, if we were building this stuff bespoke today, greenfield, we would have radically different constraints and radically different capabilities, but we don't have a time machine so instead we have to move forward and we can't just burn everything down every 18 months and start over from scratch.” There's value there.Wayne: Yeah. And they have the ability over time, Corey, to—I'm using this term lightly—to modernize. Because I don't really know what that means; I'm a big fan of mid-century modern furniture, which is no longer modern. [laugh]. But it is lovely.Corey: A glimpse of the future that didn't happen. An alternate path, a speculative fiction expressed through furniture. I hear you.Wayne: But what I'd like to make sure people understand is that they can move today. They can utilize these capabilities and these services today and move, and then they can take their time and how they want to evolve those applications based on the needs of their customers, based on needs of their business. I do not want to slow people down. The entire intent of the portfolio that I've had the privilege to build and provide to customers over these years, its intention is to enable customers to move as quickly as they want, and to then take whatever time they need to evolve that, based on your business needs. It's really that simple.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: There's a lot that you said that I want to dive into, but that the piece that I want to focus on specifically is you said you'll never use the word legacy. And I'm going to challenge you on that because in one of our conversations that we had, I asked you what product you enjoyed building the most? And your answer was, “Leaders.” And that gets into a different form of legacy. And yes, I'm playing semantic games here, but it's the question there of what are you going to be remembered for, because God willing, none of us are building technological solutions that are still going to be at least in common use in 40 years from now, but I don't necessarily know that the same can be said of people.Tell me more about this because you no longer oversee a product; you oversee a bunch of products. And something I learned the hard way is that when you become management and later different forms of management, you can do very little of the work yourself, if any. Your only tool is delegation, and that means you need to have the right people to delegate to which means hiring and handling leaders is effectively your IDE for lack of a better term. Talk to me about that.Wayne: It's a really important point. And one of the mental frameworks I put in place to myself a handful of years back, which guided me to AWS in fact, is ‘who,' ‘how,' and ‘what.' Who I work with is most important to me, how I do my work come second, and what I work on comes a distant third. And the reason for that is so simple to me now—it wasn't when I was a young engineer where ‘what' was the most important thing on the planet; you know, what I worked on define me. And it took me years to understand that what I work on doesn't define me, it's who I work with and how I do that work that defines who I am. You know, a really, really lousy day with great people is still a good day. And a really great day with people you don't want to be around is still not a great day.Corey: I think that a lot of companies, maybe all companies, tend to get a somewhat unfair reputation once they hit a certain point of size and scale. There have been all kinds of exposés in various places about how a company X—and it doesn't matter who X is; it can be Amazon, it can be any company people have heard of—is a terrible place to work. But I was talking to a friend at AWS recently who is coming back from parental leave, and they told me that, “So, get this. AWS changed their policy while I was out on parental leave. I have another month of it to take later in the year, and I'm really looking forward to that.” It's like, “That's amazing.” As opposed to the constant stories of oh, this thing is awful and this thing is terrible.It's very hard to see from the outside what a company is actually like. And I apply that to me. I only have the faintest glimmer of what it's like to actually work at AWS. But talking to the people that I get to talk to and seeing how things are built, is it perfect? No. No place is. But I hear stories about different approaches to leadership and different teams, and, on some level, I become intensely grateful that I never tried to work on any of those teams because I would have given everything I was building up to have the chance to be a part of those groups, and then where would the industry be? Certainly without my snark, and that—we would all be the poorer for it.Wayne: That's true.Corey: But there are always pockets of amazing things, same way there are pockets of terrible things. AWS at its—what—however many tens or hundreds of thousands of employees it has now doesn't have a corporate culture. It has hundreds of thousands of corporate cultures because it is a team-by-team structure there. And culture—there are—from a principles perspective, has to flow from the top, but then you have management and how they run their organizations and their teams, and the blast radius attached to that is tremendous. And that's the sort of thing that always scared the hell out of me when I was debating. Do I stop being an independent contractor—independent consultant and start hiring people here full-time?Because the biggest blocker I had was that, yeah, if I say something wrong and get smacked off the earth by AWS, okay, great. I had it coming. Asking people to risk the next phase of their careers on me saying or doing the right thing was a hell of a responsibility. That's the stuff that keeps me up at night. Not that I'm going to have a wrong take or something. It's the letting down the people who are depending on me to get things right.Wayne: Yeah. So, leadership is a, it's a lot of fun; it's a lot of responsibility as well. And so to your question, the thing that I will build—going back to the who, how, and what—the thing that I will build will the most important—the last thing I build will the most important—is the leaders I leave behind. Those leaders will go on for decades and build more leaders. They'll build more products, they'll build more businesses, they'll do great things, but my job is to ensure that I build the right leaders that ensure that the culture that we do have at AWS—or wherever else they decide to go in the fullness of time—they will carry with them the best elements of the AWS or Amazon culture, through AWS or into other companies.I'll use the word: That will be my legacy. And right now, I look at doing that, not just with folks I bring into management roles, but I look at that in terms of the young engineers, the young data scientists, the young DevOps folks that we bring into the company, and say, “Where can I find incredibly passionate and curious learners, regardless of their background, regardless of where they come from, who can, in the fullness of the next five to seven years, become those leaders?” At that point, we will have a company that continues to represent the people, continue to represent the customers and the communities we serve. We will understand our customer. That to me is what it means to be customer-obsessed is to understand your customer. And you can understand your customer best by being a representative population of who they are.Corey: One thing that we often decry in this industry is the pox of short-term thinking and over-emphasis on next quarter's numbers. Think longer-term. But when people say that, they're often thinking in three to five years out. You're talking about something that spans decades and that is borderline unheard of in corporate life.Wayne: Yeah, well. Thanks. [laugh].Corey: [laugh]. I mean, I had to think about this stuff a fair bit myself recently because let's face it, I have rendered myself completely unemployable. I went into this knowing it was, like, burn the boats behind me because this, for better or worse, is the last job I will ever have. And—maybe because I'm going to get killed in two months; we don't know–but it's—I'm very good at antagonizing people with a lot of bigger spite budget than I have—but that is how I have to think about these things. In the context of something at your scope and scale—because if your storage service or services have a bad day, so to all of your customers. That is something that weighs on the mind of every Amazonian I've ever spoken to, including someone who's joined their first job out of school, two weeks ago.It's a culture of not fear, but awareness of the weight of responsibility. And some folks obviously carry that better than others, but it is one of those things when I start talking to people who are new Amazon employees, and I'm starting to be able to categorize them mentally how they talk about certain things of, “You're going to go far at this company,” versus the, “Let's see how this plays out.” You can pick up on some of these things, in the fullness of time.Wayne: Having this level of responsibility, knowing that everything from entertainment to critical life care is dependent on the decisions you make and on the product you build and operate is a great responsibility. I can speak personally, it motivates me every day. You know, I can probably pick three days out of the near ten years I've been building at AWS where I just wanted the day off. [laugh]. I didn't want that responsibility.Now, of course, we have the leadership I just talked about that runs the services every day, who was there to make sure that on those three days that I didn't show up, everything was going to be fine. And it, of course, was fine. But doing what we do is hard work. But I've never found anything more rewarding. And just speaking to a customer—a single customer—who's had a great day, or a customer hasn't had a great day, but you've done the right thing and there trust in you is strong, they're unhappy with you, but their trust in you is strong. That is also a great day.Corey: So, much of what happens in cloud is thankless. We started off this episode talking about how the EFS integration into other services. It is just this thing that I remark upon about how wonderful and great it is, but for most customers, like, “Okay.” They just click it and it's great. When it's not there, they find it obnoxious and challenging, but when it's there, just, “Okay, this is working as it should,” and move on. So, much of the work and challenge and victories are unsung, and I try not to pass those things idly by and just take the time to appreciate the things that I see.Because as I said, companies are made of people, and there's a tremendous amount of innovation and improvement that's going on constantly. I sometimes say—probably not enough—that 90, to 95% of what AWS does is excellent. That missing gap to get to 100 is both frustrating and, honestly, rife with opportunities for hilarity. So yeah, I don't want to spend all my time talking how great all this stuff is—I should just work in marketing if that's the case—I want to talk about what it takes to go from great to perfect, and you're never going to get there, but that's okay; it means I've going to have a job for as long as I want one.Wayne: You know, there's a term that we often use, and that is for our customers, we need to operate our services so that they're indistinguishable from perfect. That's a tall bar.Corey: Mistakes show.Wayne: Yeah. But it is our responsibility. And frankly, as builders and strong owners, it comes with a lot of fun. It's really hard, but it comes with a lot of fun. Like, being able to do that… you know, this conversation we're having right now is likely traveling over some piece of the cloud… everything that was enabled during… the pandemic, which has been very challenging for a lot of people—for everybody—for your haircut for a while it was very challenging for.Corey: Oh, that was one of the hardest parts of the pandemic. I expect to find that the list of pandemic-related fallout on Wikipedia someday.Wayne: [laugh]. And, you know, what we do enabled a large part of people's personal lives, business lives, the economy to continue at some level of normalcy, which if it did not exist, it would have been a very different place. And we're very grateful for being able to be able to do that. We're very proud that we've been able to do that at the scale we do during the last couple of years. It's taught us a lot of lessons. It's been fantastic.Corey: I'm very light-hearted about some of the workloads I put on AWS. I have a Last Tweet in AWS Twitter client that basically does shitposting threads in long-form. And it's great. And if it winds up breaking, then there's no big deal. I don't care about that.But I don't have to care about that; you do have to care about that because just because I don't take one of my workload seriously, you as AWS can never have that context. And that's why you take every weird blip, everything, “I don't understand this,” or, “This hasn't lived up to my expectations on it,” as if it were a life-critical system because for some customers they are. And I have always appreciated how—and been bemused at times—by how deadly seriously AWS folks take my complaints about, “Yeah, my shitposting app isn't posting shit quite the way I want it to.” [unintelligible 00:36:05] anything but the utmost of professionalism and respect when I'm talking about service gaps and challenges I'm having building and deploying things. And I feel a little bad at times, just because I'm making people care so seriously about things that don't actually matter for crap. But I've always appreciated it.Wayne: Our frame of reference is the millisecond and the penny. We worry about every penny you spend, and we want to make sure you—Corey: Oh, there are times that, in some cases, for some services—I believe it was trillionths of a cent is how a couple of them have a granularity in the billing system. So, all you care about far smaller denominations than pennies.Wayne: Well, I might be a little generous in what I'm saying right now, but for the frame of reference for the listener, you know, we don't think about things at the quarter and the million dollars. That's not our frame of reference. We think about things in terms of the millisecond a penny. And you know, yes, you're right, we now think more in microseconds than we do milliseconds, and we do think in fractions of pennies, not pennies. But it's a frame of reference.What matters is the details. And there is no workload is unimportant. If it's your workload, it's just as important to us as any other workload. Even if it does poke snark at us. We're okay with that.Corey: And sometimes there is the idea of a memento mori, or someone yelling at the emperor that they have no clothes. The problem is in the story of the emperor not having any clothes, the kid would at least occasionally shut up once in a while, and I never seem to so there is that part of it too.Wayne: Springs hope eternal.Corey: Exactly. Wayne, I want to thank you for taking so much time to speak with me today. If people want to learn more about what you're up to, and how you view these things, where can they find you?Wayne: Well, the two places they can find me most often, Corey, not at my desk or at a local bar; they can find me on LinkedIn, and it's Wayne Duso. There's only two of us on LinkedIn, so find the guy who wears glasses that has no hair, and that's me. And the second place they can find me is on Twitter, which my handle is not obfuscated at all. It's @wayneduso.Corey: And we will, of course, put links to both of those in the [show notes 00:38:05]. Thank you so much for your time today. I really appreciate it.Wayne: Corey, it's always a pleasure. And I'm looking forward to sharing maybe a drink at that bar with you soon.Corey: I look forward to it. I can't wait to go back out to bars again. Oh, my God. [sigh]. Wayne Duso VP of Engineering at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice telling me that I'm completely wrong when it comes to using EFS for things that I should instead be using a better storage system that's more cloud-native. Like Route 53 [unintelligible 00:38:48].Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Eoin and Luciano continue their series about event services. In this episode, they chat about EventBridge and explore why this AWS service has such a great potential for event-based serverless applications. This episode presents some interesting examples of when and how to use EventBridge. It also covers all the different classes of events that you can manage with EventBridge: AWS events, third-party events and custom events. We discuss limits and pricing and, finally, we show how things can go wrong and how much you can end up paying for it. We conclude the episode with some tips and resources to avoid shooting yourself in the foot and get good observability when using EventBridge.In this episode, we mentioned the following resources: - Our previous episode about all things SQS: https://www.youtube.com/watch?v=svoA-ds8-8c - Our introductory episode about what services you should use for events: https://www.youtube.com/watch?v=CG7uhkKftoY - List of AWS services that can trigger EventBridge events: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-service-event.html - An example of how to make HTTP calls directly from EventBridge (Sheen Brisals): https://medium.com/lego-engineering/amazon-eventbridge-api-destinations-demystified-part-i-23fa70d9a04d - How to test when using EventBridge (by Paul Swail): https://serverlessfirst.com/eventbridge-testing-guide/ - Eventbridge CLI tool: https://github.com/spezam/eventbridge-cli - Lumigo CLI: https://github.com/lumigo-io/lumigo-CLI#lumigo-cli-tail-eventbridge-bus - EventBridge Atlas: https://eventbridge-atlas.netlify.app/ - EventBridge Canon: https://eventbridge-canon.netlify.app/ - Accelerate Serverless Adoption with EventBridge (talk by Sheen Brisals): https://www.youtube.com/watch?v=sTZpoSGOSOI - Series of Articles by Sheen Brisals on EventBridge: https://sbrisals.medium.com/table-of-contents-set-pieces-16c1ca1ecb33 This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige
An airhacks.fm conversation with Mark Sailes (@MarkSailes3) about: Checkout episode "#168 Serverless Java on AWS" with Mark, AWS Lambda Powertools for Java was was initiated in 2020, AWS Lambda Powertools for Java started with logging tracing and custom metrics, the major use cases for AWS Lambda Powertools, lambda best practices are implemented as modules, Lambda Powertools Java Logging and structured logging in JSON format with additional context provided with annotation, including the correct amount of data, logging writes to standard out, Lambda, metrics and the AWS CloudWatch Embedded Metrics Format (EMF), AWS Lambda and metrics scraping, Lambda Powertools Java Metrics, providing Lambdas to AWS CloudWatch via EMF, synchronous AWS CloudWatch calls are expensive, secrets and configuration management with parameters, AWS Systems Manager Parameter Store support, parameter caching, Lambda Java-like tracing with AWS X-Ray, Lambda Powertools annotation for X-Ray, adding exceptions to AWS X-Ray, adding correlation id support for cross Lamba logging, AWS Lambda Powertools for Java is an incubator, support for CloudFormation custom resources, the SQS and SNS message offloading to S3, validation support of business objects with JSON-Schema and JMESPath, the killer Use Case for AOP, writing ugly code for performance Mark Sailes on twitter: @MarkSailes3, Mark's blog: mark-sailes.medium.com
Luciano and Eoin take a deep dive into SQS as part of a series on AWS event services and event-driven architecture. We talk about the kind of problems SQS can solve, all of the SQS features and how to configure and use SQS to achieve reliability and scalability without all the complexity. We also take some time to detail how SQS works with Lambda in terms of scaling, batching and filtering. In this episode, we gave a special mention to three highly-recommended re:Invent 2021 talks on the topic of Enterprise Integration Patterns with AWS services: AWS re:Invent 2021 - Application integration patterns for microservices - Gregor Hohpe - https://www.youtube.com/watch?v=ttJAIQf7cTw AWS re:Invent 2021 - Building next-gen applications with event-driven architectures - Sam Dengler - https://www.youtube.com/watch?v=U5GZNt0iMZY AWS re:Invent 2021 - AWS re:Invent 2021 - Application integration patterns for microservices - Dirk Froehner - https://www.youtube.com/watch?v=QhfuzEkN3Ck - In addition, we mentioned the following resources. - SQS: https://aws.amazon.com/sqs/ - Using Lambda with SQS: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html - Lambda SQS Scaling: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-sqs-scaling/ - SLIC Watch (Serverless plugin for easy dashboards and alarms): https://github.com/fourTheorem/slic-watch This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige
In this episode of AWS Bites podcast, Luciano and Eoin talk about AWS services related to events and message passing like SQS, SNS, Event Bridge, Kinesis and Kafka (MSK). We discuss in which context is convenient to use messages and events and we deliver a quick walkthrough of all the services discussing major features and some practical examples on how to use them. In this episode we mentioned the following resources: - SNS: https://aws.amazon.com/sns/ - SQS: https://aws.amazon.com/sqs/ - Event Bridge: https://aws.amazon.com/eventbridge/ - Kinesis: https://aws.amazon.com/kinesis/ - Kafka (MSK): https://aws.amazon.com/msk/ This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige
In this episode the team continue their conversation on the well architected framework with the Reliability Pillar. It has 4 sections: foundations, workload architecture, change management, and failure management. If you're building in a traditional way then reliability can be a huge amount of work. But we're serverless heads. AWS make it a lot easier to do some of these things. A lot of service quotas and constraints are baked into the foundations. From a change management perspective, you want to get into the continuous delivery kind of mindset, so there's a lot of monitoring if you use the modern tools. From a failure management perspective, serverless is ephemeral, so it's built for retries. You can design so that those areas are slightly easier to work with. When you look at the foundation section of the reliability pillar, you're probably looking at how to plan and not over provision or overspend and how to scale up effectively. You put a lot of time into the foundation section, where as if you are in Serverless you still need to look at foundations, but I think it's less intensive. So you're looking at things like DNS, or how you to write effectively, and things like that. You still need to worry about quotas and some of the constraints and resource constraints of your account. But you're starting from a more mature reliability standpoint, when you adopt the serverless and a serviceful approach. So from a change management point of view, adapting to change is built into serverless capabilities and how managed services operate. From a failure management point of view, a lot of that is baked in especially if you've built an event driven asynchronous workload, using SNS, SQS or Eventbridge. A lot of circuit breaker type mentality, retry and dead letter queues are coming out of the box now. Increasingly they are maturing those capabilities to make it easier for teams to have a default, resilient driver and reliable capability. With serverless architecture we tend to take a micro path: micro service or micro front end. It's opinionated so there's only so many ways to connect these things. It's aimed at speed, cost effectiveness and reliability. If you are using lambda, it's 6 AZs wide, in terms of the HA side of things. It's the same with DynamoDB or Aurora. There's lot of stuff that AWS has thought about for you in relation to the foundational side and you can benefit from that. And that influences how you actually assemble your workload in terms of the workload architecture as you've got to work within those constraints. What's interesting about the reliability pillar and well architected is that they have been there for eight years, but they've just recently added this new section, which is workload architecture. And one of the questions is how do you design interactions in a distributed system to prevent failures? So there's a specific section on how to design distributed workloads. That's more than a nod. It's proof that a lot of customers are moving towards distributed micro service and modern application stacks. There's a lot of depth in that. To do that well, you need like upfront design. You need to think about your system before you design it. You can't code you way out of it. Serverless makes some of these things easier, but the time that you don't spend on retries is put into domain design. It's an equal amount of effort and it's still challenging but you're putting precious time into system design, as opposed to tuning a non functional thing. It's still hard to build the systems, but I firmly believe you get a better system at the end of the day. Some of the questions look at: if we auto scale this serverless component, what sort of load or pressures are put on something that doesn't auto scale? It forces you to think about protection and where are the choke points? Should you be throttling your workload? Should you be setting constraints on the scalability of your serverless workload? Those type of questions get to something that we're very passionate about, which is testing for resilience and continuous resiliency, and having test days or game days that tease out where the choke points are. Where are your failure cases? Where are the downstream systems that can't respond or can't take the load as you pass to it? All of us agree that it is easier to do it with serverless and it's important that the setup is designed properly. But you can get good feedback by testing for this stuff. And you've seen the maturity of the fault injection service that has come out. It's easy to use and I'm hoping to see that evolve and mature to be much more serverless focused as well. But it's a lot easier to test for resiliency. So you're not guessing anymore. You have real automation and you've baked that into your CI CD pipeline as well. Serverless Craic from The Serverless Edge theserverlessedge.com @ServerlessEdge
Pull your podcast player out of instant retrieval, because we're discussing re:Invent 2021 as well as the weeks before it. Lots of announcements; big, small, weird, awesome, and anything in between. We had fun with this episode and hope you do too. Find us at melb.awsug.org.au or as @AWSMelb on Twitter. News Finally in Sydney AWS Snowcone SSD is now available in the US East (Ohio), US West (San Francisco), Asia Pacific (Singapore), Asia Pacific (Sydney) and AWS Asia Pacific (Tokyo) regions Amazon EC2 M6i instances are now available in 5 additional regions Serverless Introducing Amazon EMR Serverless in preview Announcing Amazon Kinesis Data Streams On-Demand Announcing Amazon Redshift Serverless (Preview) Introducing Amazon MSK Serverless in public preview Introducing Amazon SageMaker Serverless Inference (preview) Simplify CI/CD Configuration for AWS Serverless Applications and your favorite CI/CD system – General Availability Amazon AppStream 2.0 launches Elastic fleets, a serverless fleet type AWS Chatbot now supports management of AWS resources in Slack (Preview) Lambda AWS Lambda now supports partial batch response for SQS as an event source AWS Lambda now supports cross-account container image pulling from Amazon Elastic Container Registry AWS Lambda now supports mTLS Authentication for Amazon MSK as an event source AWS Lambda now logs Hyperplane Elastic Network Interface (ENI) ID in AWS CloudTrail data events Step Functions AWS Step Functions Synchronous Express Workflows now supports AWS PrivateLink Amplify Introducing AWS Amplify Studio AWS Amplify announces the ability to override Amplify-generated resources using CDK AWS Amplify announces the ability to add custom AWS resources to Amplify-created backends using CDK and CloudFormation AWS Amplify UI launches new Authenticator component for React, Angular, and Vue AWS Amplify announces the ability to export Amplify backends as CDK stacks to integrate into CDK-based pipelines AWS Amplify expands its Notifications category to include in-app messaging (Developer Preview) AWS Amplify announces a redesigned, more extensible GraphQL Transformer for creating app backends quickly Containers Fargate Announcing AWS Fargate for Amazon ECS Powered by AWS Graviton2 Processors ECS Amazon ECS now adds container instance health information Amazon ECS has improved Capacity Providers to deliver faster Cluster Auto Scaling Amazon ECS-optimized AMI is now available as an open-source project Amazon ECS announces a new integration with AWS Distro for OpenTelemetry EKS Amazon EKS on AWS Fargate now Supports the Fluent Bit Kubernetes Filter Amazon EKS adds support for additional cluster configuration options using AWS CloudFormation Visualize all your Kubernetes clusters in one place with Amazon EKS Connector, now generally available AWS Karpenter v0.5 Now Generally Available AWS customers can now find, subscribe to, and deploy third-party applications that run in any Kubernetes environment from AWS Marketplace Other Amazon ECR announces pull through cache repositories AWS App Mesh now supports ARM64-based Envoy Images EC2 & VPC Instances New – EC2 Instances (G5) with NVIDIA A10G Tensor Core GPUs | AWS News Blog Announcing new Amazon EC2 G5g instances powered by AWS Graviton2 processors Introducing Amazon EC2 R6i instances Introducing two new Amazon EC2 bare metal instances Amazon EC2 Mac Instances now support hot attach and detach of EBS volumes Amazon EC2 Mac Instances now support macOS Monterey Announcing Amazon EC2 M1 Mac instances for macOS Announcing preview of Amazon Linux 2022 Elastic Beanstalk supports AWS Graviton-based Amazon EC2 instance types Announcing preview of Amazon EC2 Trn1 instances Announcing new Amazon EC2 C7g instances powered by AWS Graviton3 processors Announcing new Amazon EC2 Im4gn and Is4gen instances powered by AWS Graviton2 processors Introducing the AWS Graviton Ready Program Introducing Amazon EC2 M6a instances AWS Compute Optimizer now offers enhanced infrastructure metrics, a new feature for EC2 recommendations AWS Compute Optimizer now offers resource efficiency metrics Networking AWS price reduction for data transfers out to the internet Amazon Virtual Private Cloud (VPC) customers can now create IPv6-only subnets and EC2 instances Application Load Balancer and Network Load Balancer end-to-end IPv6 support AWS Transit Gateway introduces intra-region peering for simplified cloud operations and network connectivity Amazon Virtual Private Cloud (VPC) announces IP Address Manager (IPAM) to help simplify IP address management on AWS Amazon Virtual Private Cloud (VPC) announces Network Access Analyzer to help you easily identify unintended network access Introducing AWS Cloud WAN Preview Introducing AWS Direct Connect SiteLink Other Recover from accidental deletions of your snapshots using Recycle Bin Amazon EBS Snapshots introduces a new tier, Amazon EBS Snapshots Archive, to reduce the cost of long-term retention of EBS Snapshots by up to 75% Amazon CloudFront now supports configurable CORS, security, and custom HTTP response headers Amazon EC2 now supports access to Red Hat Knowledgebase Amazon EC2 Fleet and Spot Fleet now support automatic instance termination with Capacity Rebalancing AWS announces a new capability to switch license types for Windows Server and SQL Server applications on Amazon EC2 AWS Batch introduces fair-share scheduling Amazon EC2 Auto Scaling Now Supports Predictive Scaling with Custom Metrics Dev & Ops New services Measure and Improve Your Application Resilience with AWS Resilience Hub | AWS News Blog Scalable, Cost-Effective Disaster Recovery in the Cloud | AWS News Blog Announcing general availability of AWS Elastic Disaster Recovery AWS announces the launch of AWS AppConfig Feature Flags in preview Announcing Amazon DevOps Guru for RDS, an ML-powered capability that automatically detects and diagnoses performance and operational issues within Amazon Aurora Introducing Amazon CloudWatch Metrics Insights (Preview) Introducing Amazon CloudWatch RUM for monitoring applications' client-side performance IaC AWS announces Construct Hub general availability AWS Cloud Development Kit (AWS CDK) v2 is now generally available You can now import your AWS CloudFormation stacks into a CloudFormation stack set You can now submit multiple operations for simultaneous execution with AWS CloudFormation StackSets AWS CDK releases v1.126.0 - v1.130.0 with high-level APIs for AWS App Runner and hotswap support for Amazon ECS and AWS Step Functions SDKs AWS SDK for Swift (Developer Preview) AWS SDK for Kotlin (Developer Preview) AWS SDK for Rust (Developer Preview) CICD AWS Proton now supports Terraform Open Source for infrastructure provisioning AWS Proton introduces Git management of infrastructure as code templates AWS App2Container now supports Jenkins for setting up a CI/CD pipeline Other Amazon CodeGuru Reviewer now detects hardcoded secrets in Java and Python repositories EC2 Image Builder enables sharing Amazon Machine Images (AMIs) with AWS Organizations and Organization Units Amazon Corretto 17 Support Roadmap Announced Amazon DevOps Guru now Supports Multi-Account Insight Aggregation with AWS Organizations AWS Toolkits for Cloud9, JetBrains and VS Code now support interaction with over 200 new resource types AWS Fault Injection Simulator now supports Amazon CloudWatch Alarms and AWS Systems Manager Automation Runbooks. AWS Device Farm announces support for testing web applications hosted in an Amazon VPC Amazon CloudWatch now supports anomaly detection on metric math expressions Introducing Amazon CloudWatch Evidently for feature experimentation and safer launches New – Amazon CloudWatch Evidently – Experiments and Feature Management | AWS News Blog Introducing AWS Microservice Extractor for .NET Security AWS Secrets Manager increases secrets limit to 500K per account AWS CloudTrail announces ErrorRate Insights AWS announces the new Amazon Inspector for continual vulnerability management Amazon SQS Announces Server-Side Encryption with Amazon SQS-managed encryption keys (SSE-SQS) AWS WAF adds support for Captcha AWS Shield Advanced introduces automatic application-layer DDoS mitigation Security Hub AWS Security Hub adds support for AWS PrivateLink for private access to Security Hub APIs AWS Security Hub adds three new FSBP controls and three new partners SSO Manage Access Centrally for CyberArk Users with AWS Single Sign-On Manage Access Centrally for JumpCloud Users with AWS Single Sign-On AWS Single Sign-On now provides one-click login to Amazon EC2 instances running Microsoft Windows AWS Single Sign-On is now in scope for AWS SOC reporting Control Tower AWS Control Tower now supports concurrent operations for detective guardrails AWS Control Tower now supports nested organizational units AWS Control Tower now provides controls to meet data residency requirements Deny services and operations for AWS Regions of your choice with AWS Control Tower AWS Control Tower introduces Terraform account provisioning and customization Data Storage & Processing Databases Relational databases Announcing Amazon RDS Custom for SQL Server New Multi-AZ deployment option for Amazon RDS for PostgreSQL and for MySQL; increased read capacity, lower and more consistent write transaction latency, and shorter failover time (Preview) Amazon RDS now supports cross account KMS keys for exporting RDS Snapshots Amazon Aurora supports MySQL 8.0 Amazon RDS on AWS Outposts now supports backups on AWS Outposts Athena Amazon Athena adds cost details to query execution plans Amazon Athena announces cross-account federated query New and improved Amazon Athena console is now generally available Amazon Athena now supports new Lake Formation fine-grained security and reliable table features Announcing Amazon Athena ACID transactions, powered by Apache Iceberg (Preview) Redshift Announcing preview for write queries with Amazon Redshift Concurrency Scaling Amazon Redshift announces native support for SQLAlchemy and Apache Airflow open-source frameworks Amazon Redshift simplifies the use of other AWS services by introducing the default IAM role Announcing Amazon Redshift cross-region data sharing (preview) Announcing preview of SQL Notebooks support in Amazon Redshift Query Editor V2 Neptune Announcing AWS Graviton2-based instances for Amazon Neptune AWS releases open source JDBC driver to connect to Amazon Neptune MemoryDB Amazon MemoryDB for Redis now supports AWS Graviton2-based T4g instances and a 2-month Free Trial Database Migration Service AWS Database Migration Service now supports parallel load for partitioned data to S3 AWS Database Migration Service now supports Kafka multi-topic AWS Database Migration Service now supports Azure SQL Managed Instance as a source AWS Database Migration Service now supports Google Cloud SQL for MySQL as a source Introducing AWS DMS Fleet Advisor for automated discovery and analysis of database and analytics workloads (Preview) AWS Database Migration Service now offers a new console experience, AWS DMS Studio AWS Database Migration Service now supports Time Travel, an improved logging mechanism Other Database Activity Streams now supports Graviton2-based instances Amazon Timestream now offers faster and more cost-effective time series data processing through scheduled queries, multi-measure records, and magnetic storage writes Amazon DynamoDB announces the new Amazon DynamoDB Standard-Infrequent Access table class, which helps you reduce your DynamoDB costs by up to 60 percent Achieve up to 30% better performance with Amazon DocumentDB (with MongoDB compatibility) using new Graviton2 instances S3 Amazon S3 on Outposts now delivers strong consistency automatically for all applications Amazon S3 Lifecycle further optimizes storage cost savings with new actions and filters Announcing the new Amazon S3 Glacier Instant Retrieval storage class - the lowest cost archive storage with milliseconds retrieval Amazon S3 Object Ownership can now disable access control lists to simplify access management for data in S3 Amazon S3 Glacier storage class is now Amazon S3 Glacier Flexible Retrieval; storage price reduced by 10% and bulk retrievals are now free Announcing the new S3 Intelligent-Tiering Archive Instant Access tier - Automatically save up to 68% on storage costs Amazon S3 Event Notifications with Amazon EventBridge help you build advanced serverless applications faster Amazon S3 console now reports security warnings, errors, and suggestions from IAM Access Analyzer as you author your S3 policies Amazon S3 adds new S3 Event Notifications for S3 Lifecycle, S3 Intelligent-Tiering, object tags, and object access control lists Glue AWS Glue DataBrew announces native console integration with Amazon AppFlow AWS Glue DataBrew now supports custom SQL statements to retrieve data from Amazon Redshift and Snowflake AWS Glue DataBrew now allows customers to create data quality rules to define and validate their business requirements FSx Introducing Amazon FSx for OpenZFS Amazon FSx for Lustre now supports linking multiple Amazon S3 buckets to a file system Amazon FSx for Lustre can now automatically update file system contents as data is deleted and moved in Amazon S3 Announcing the next generation of Amazon FSx for Lustre file systems Backup Announcing preview of AWS Backup for Amazon S3 AWS Backup adds support for Amazon Neptune AWS Backup adds support for Amazon DocumentDB (with MongoDB compatibility) AWS Backup provides new resource assignment rules for your data protection policies AWS Backup adds support for VMware workloads Other AWS Lake Formation now supports AWS PrivateLink AWS Transfer Family adds identity provider options and enhanced monitoring capabilities Introducing ability to connect to EMR clusters in different subnets in EMR Studio AWS Snow Family now supports external NTP server configuration Announcing data tiering for Amazon ElastiCache for Redis Now execute python files and notebooks from another notebook in EMR Studio AWS Snow Family launches offline tape data migration capability AI & ML SageMaker Introducing Amazon SageMaker Canvas - a visual, no-code interface to build accurate machine learning models Announcing Fully Managed RStudio on Amazon SageMaker for Data Scientists | AWS News Blog Amazon SageMaker now supports inference testing with custom domains and headers from SageMaker Studio Amazon SageMaker Pipelines now supports retry policies and resume Announcing new deployment guardrails for Amazon SageMaker Inference endpoints Amazon announces new NVIDIA Triton Inference Server on Amazon SageMaker Amazon SageMaker Pipelines now integrates with SageMaker Model Monitor and SageMaker Clarify Amazon SageMaker now supports cross-account lineage tracking and multi-hop lineage querying Introducing Amazon SageMaker Inference Recommender Introducing Amazon SageMaker Ground Truth Plus: Create high-quality training datasets without having to build labeling applications or manage the labeling workforce on your own Amazon SageMaker Studio Lab (currently in preview), a free, no-configuration ML service Amazon SageMaker Studio now enables interactive data preparation and machine learning at scale within a single universal notebook through built-in integration with Amazon EMR Other General Availability of Syne Tune, an open-source library for distributed hyperparameter and neural architecture optimization Amazon Translate now supports AWS KMS Encryption Amazon Kendra releases AWS Single Sign-On integration for secure search Amazon Transcribe now supports automatic language identification for streaming transcriptions AWS AI for data analytics (AIDA) partner solutions Introducing Amazon Lex Automated Chatbot Designer (Preview) Amazon Kendra launches Experience Builder, Search Analytics Dashboard, and Custom Document Enrichment Other Cool Stuff In The Works – AWS Canada West (Calgary) Region | AWS News Blog Unified Search in the AWS Management Console now includes blogs, knowledge articles, events, and tutorials AWS DeepRacer introduces multi-user account management Amazon Pinpoint launches in-app messaging as a new communications channel Amazon AppStream 2.0 Introduces Linux Application Streaming Amazon SNS now supports publishing batches of up to 10 messages in a single API request Announcing usability improvements in the navigation bar of the AWS Management Console Announcing General Availability of Enterprise On-Ramp Announcing preview of AWS Private 5G AWS Outposts is Now Available in Two Smaller Form Factors Introducing AWS Mainframe Modernization - Preview Introducing the AWS Migration and Modernization Competency Announcing AWS Data Exchange for APIs Amazon WorkSpaces introduces Amazon WorkSpaces Web Amazon SQS Enhances Dead-letter Queue Management Experience For Standard Queues Introducing AWS re:Post, a new, community-driven, questions-and-answers service AWS Resource Access Manager enables support for global resource types AWS Ground Station launches expanded support for Software Defined Radios in Preview Announcing Amazon Braket Hybrid Jobs for running hybrid quantum-classical workloads on Amazon Braket Introducing AWS Migration Hub Refactor Spaces - Preview Well-Architected Framework Customize your AWS Well-Architected Review using Custom Lenses New Sustainability Pillar for the AWS Well-Architected Framework IoT Announcing AWS IoT RoboRunner, Now Available in Preview AWS IoT Greengrass now supports Microsoft Windows devices AWS IoT Core now supports Multi-Account Registration certificates on IoT Credential Provider endpoint Announcing AWS IoT FleetWise (Preview), a new service for transferring vehicle data to the cloud more efficiently Announcing AWS IoT TwinMaker (Preview), a service that makes it easier to build digital twins AWS IoT SiteWise now supports hot and cold storage tiers for industrial data New connectivity software, AWS IoT ExpressLink, accelerates IoT development (Preview) AWS IoT Device Management Fleet Indexing now supports two additional data sources (Preview) Connect Amazon Connect now enables you to create and orchestrate tasks directly from Flows Amazon Connect launches scheduled tasks Amazon Connect launches Contact APIs to fetch and update contact details programmatically Amazon Connect launches API to configure security profiles programmatically Amazon Connect launches APIs to archive and delete contact flows Amazon Connect now supports contact flow modules to simplify repeatable logic Sponsors CMD Solutions Silver Sponsors Cevo Versent
About MaciejMaciej Winnicki is a serverless enthusiast with over 6 years of experience in writing software with no servers whatsoever. Serverless Engineer at Stedi, Cloudash Founder, ex-Engineering Manager, and one of the early employees at Serverless Inc.Links: Cloudash: https://cloudash.dev Maciej Winnicki Twitter: https://twitter.com/mthenw Tomasz Łakomy Twitter: https://twitter.com/tlakomy Cloudash email: hello@cloudash.dev TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part byLaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visitlaunchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: This episode is sponsored in part by our friends at Rising Cloud, which I hadn't heard of before, but they're doing something vaguely interesting here. They are using AI, which is usually where my eyes glaze over and I lose attention, but they're using it to help developers be more efficient by reducing repetitive tasks. So, the idea being that you can run stateless things without having to worry about scaling, placement, et cetera, and the rest. They claim significant cost savings, and they're able to wind up taking what you're running as it is in AWS with no changes, and run it inside of their data centers that span multiple regions. I'm somewhat skeptical, but their customers seem to really like them, so that's one of those areas where I really have a hard time being too snarky about it because when you solve a customer's problem and they get out there in public and say, “We're solving a problem,” it's very hard to snark about that. Multus Medical, Construx.ai and Stax have seen significant results by using them. And it's worth exploring. So, if you're looking for a smarter, faster, cheaper alternative to EC2, Lambda, or batch, consider checking them out. Visit risingcloud.com/benefits. That's risingcloud.com/benefits, and be sure to tell them that I said you because watching people wince when you mention my name is one of the guilty pleasures of listening to this podcast.Corey: Welcome to Screaming in the Cloud. I'm Cloud Economist Corey Quinn. And my guest today is Maciej Winnicki, who is the founder of Cloudash. Now, before I dive into the intricacies of what that is, I'm going to just stake out a position that one of the biggest painful parts of working with AWS in any meaningful sense, particularly in a serverless microservices way, is figuring out what the hell's going on in the environment. There's a bunch of tools offered to do this and they're all—yeee, they aspire to mediocrity. Maciej, thank you for joining me today.Corey: Welcome to Screaming in the Cloud. I'm Cloud Economist Corey Quinn. And my guest today is Maciej Winnicki, who is the founder of Cloudash. Now, before I dive into the intricacies of what that is, I'm going to just stake out a position that one of the biggest painful parts of working with AWS in any meaningful sense, particularly in a serverless microservices way, is figuring out what the hell's going on in the environment. There's a bunch of tools offered to do this and they're all—yeee, they aspire to mediocrity. Maciej, thank you for joining me today.Maciej: Thank you for having me.Corey: So, I turned out to have accidentally blown up Cloudash, sort of before you were really ready for the attention. You, I think, tweeted about it or put it on Hacker News or something; I stumbled over it because it turns out that anything that vaguely touches cloud winds up in my filters because of awesome technology, and personality defects on my part. And I tweeted about it as I set it up and got the thing running, and apparently this led to a surge of attention on this thing that you've built. So, let me start off with an apology. Oops, I didn't realize it was supposed to be a quiet launch.Maciej: I actually thank you for that. Like, that was great. And we get a lot of attention from your tweet thread, actually because at the end, that was the most critical part. At the end of the twitter, you wrote that you're staying as a customer, so we have it on our website and this is perfect. But actually, as you said, that's correct.Our marketing strategy for releasing Cloudash was to post it on LinkedIn. I know this is not, kind of, the best strategy, but that was our plan. Like, it was like, hey, like, me and my friend, Tomasz, who's also working on Cloudash, we thought like, let's just post it on LinkedIn and we'll see how it goes. And accidentally, I'm receiving a notification from Twitter, “Hey, Corey started tweeting about it.” And I was like, “Oh, my God, I'm having a heart attack.” But then I read the, you know—Corey: Oops.Maciej: [laugh]. Yeah. I read the, kind of, conclusion, and I was super happy. And again, thank you for that because this is actually when Cloudash kind of started rolling as a product and as a, kind of, business. So yeah, that was great.Corey: To give a little backstory and context here is, I write a whole bunch of serverless nonsense. I build API's Gateway, I hook them up to Lambda's Function, and then it sort of kind of works. Ish. From there, okay, I would try and track down what was going on because in a microservices land, everything becomes a murder mystery; you're trying to figure out what's broken, and things have exploded. And I became a paying customer of IOpipe. And then New Relic bought them. Well, crap.Then I became a paying customer of Epsagon. And they got acquired by Cisco, at which point I immediately congratulated the founders, who I know on a social basis, and then closed my account because I wanted to get out before Cisco ruins it because, Cisco. Then it was, what am I going to use next? And right around that time is when I stumbled across Cloudash. And it takes a different approach than any other entity in the space that I've seen because you are a native Mac desktop app. I believe your Mac only, but you seem to be Electron, so I could be way off base on that.Maciej: So, we're Linux as well right now and soon we'll be Windows as well. But yeah, so, right now is Mac OS and Linux. Yeah, that's correct. So, our approach is a little bit different.So, let me start by saying what's Cloudash? Like, Cloudash is a desktop app for, kind of, monitoring and troubleshooting serverless architectures services, like, serverless stuff in general. And the approach that we took is a little bit different because we are not web-based, we're desktop-based. And there's a couple of advantages of that approach. The first one is that, like, you don't need to share your data with us because we're not, kind of, downloading your metrics and logs to our back end and to process them, et cetera, et cetera. We are just using the credentials, the AWS profiles that you have defined on your computer, so nothing goes out of your AWS account.And I think this is, like, considering, like, from the security perspective, this is very crucial. You don't need to create a role that you give us access to or anything like that. You just use the stuff that you have on your desktop, and everything stays on your AWS account. So, nothing—we don't download it, we don't process it, we don't do anything from that. And that's one approach—well, that's the one advantage. The other advantage is, like, kind of, onboarding, as I kind of mentioned because we're using the AWS profiles that you have defined in your computer.Corey: Well, you're doing significantly more than that because I have a lot of different accounts configured different ways, and when I go to one of them that uses SSO, it automatically fires me off to the SSO login page if I haven't logged in that day for a 12 hour session—Maciej: Yes.Corey: —for things that have credentials stored locally, it uses those; and for things that are using role-chaining to use assuming roles from the things I have credentials for, and the things that I just do role assumption in, and it works flawlessly. It just works the way that most of my command-line tools do. I've never seen a desktop app that does this.Maciej: Yeah. So, we put a lot of effort into making sure that this works great because we know that, like, no one will use Cloudash if there's—like, not no one, but like, we're targeting, like, serverless teams, maybe, in enterprise companies, or serverless teams working on some startups. And in most cases, those teams or those engineers, they use SSO, or at least MFA, right? So, we have it covered. And as you said, like, it should be the onboarding part is really easy because you just pick your AWS profile, you just pick region, and just pick, right now, a CloudFormation stack because we get the information about your service based on CloudFormation stack. So yeah, we put a lot of effort in making sure that this works without any issues.Corey: There are some challenges to it that I saw while setting it up, and that's also sort of the nature of the fact you are, in fact, integrating with CloudWatch. For example, it's region specific. Well, what if I want to have an app that's multi-region? Well, you're going to have a bad time because doing [laugh] anything multi-region in AWS means you're going to have a bad time that gets particularly obnoxious and EC2 get to when you're doing something like Lambda@Edge, where, oh, where are the logs live; that's going to be in a CloudFront distribution in whatever region it winds up being accessed from. So, it comes down to what distribution endpoint or point of presence did that particular request go through, and it becomes this giant game of whack-a-mole. It's frustrating, and it's obnoxious, and it's also in no way your fault.Maciej: Yeah, I mean, we are at the beginning. Right now, it's the most straightforward, kind of pe—how people think about stacks of serverless. They're think in terms of regions because I think for us, regions, or replicated stacks, or things like that are not really popular yet. Maybe they will become—like, this is how AWS works as a whole, so it's not surprising that we're kind of following this path. I think my point is that our main goal, the ultimate goal, is to make monitoring, as I said, the troubleshooting serverless app as simple as possible.So, once we will hear from our customers, from our users that, “Hey, we would like to get a little bit better experience around regions,” we will definitely implement that because why not, right? And I think the whole point of Cloudash—and maybe we can go more deep into that later—is that we want to bring context into your metrics and logs. If you're seeing a, for example, X-Ray trace ID in your logs, you should be able with one click just see that the trace. It's not yet implemented in Cloudash, but we are having it in the backlog. But my point is that, like, there should be some journey when you're debugging stuff, and you shouldn't be just, like, left alone having, like, 20 tabs, Cloudash tabs open and trying to figure out where I was—like, where's the Lambda? Where's the API Gateway logs? Where are the CloudFront logs? And how I can kind of connect all of that? Because that's—it's an issue right now.Corey: Even what you've done so far is incredibly helpful compared to the baseline experience that folks will often have, where I can define a service that is comprised of a number of different functions—I have one set up right now that has seven functions in it—I grab any one of those things, and I can set how far the lookback is, when I look at that function, ranging from 5 minutes to 30 days. And it shows me at the top the metrics of invocations, the duration that the function runs for, and the number of errors. And then, in the same pane down below it, it shows the CloudWatch logs. So, “Oh, okay, great. I can drag and zoom into a specific timeframe, and I see just the things inside of that.”And I know this sounds like well, what's the hard part here? Yeah, except nothing else does it in an easy-to-use, discoverable way that just sort of hangs out here. Honestly, the biggest win for me is that I don't have to log in to the browser, navigate through some ridiculous other thing to track down what I'm talking about. It hangs out on my desktop all the time, and whether it's open or not, whenever I fire it up, it just works, basically, and I don't have to think about it. It reduces the friction from, “This thing is broken,” to, “Let me see what the logs say.”Very often I can go from not having it open at all to staring at the logs and having to wait a minute because there's some latency before the event happens and it hits CloudWatch logs itself. I'm pretty impressed with it, and I've been keeping an eye on what this thing is costing me. It is effectively nothing in terms of CloudWatch retrieval charges. Because it's not sitting there sucking all this data up all the time, for everything that's running. Like, we've all seen the monitoring system that winds up costing you more than it costs more than they charge you ancillary fees. This doesn't do that.I also—while we're talking about money, I want to make very clear—because disclaiming the direction the money flows in is always important—you haven't paid me a dime, ever, to my understanding. I am a paying customer at full price for this service, and I have been since I discovered it. And that is very much an intentional choice. You did not sponsor this podcast, you are not paying me to say nice things. We're talking because I legitimately adore this thing that you've built, and I want it to exist.Maciej: That's correct. And again, thank you for that. [laugh].Corey: It's true. You can buy my attention, but not my opinion. Now, to be clear, when I did that tweet thread, I did get the sense that this was something that you had built as sort of a side project, as a labor of love. It does not have VC behind it, of which I'm aware, and that's always going to, on some level, shade how I approach a service and how critical I'm going to be on it. Just because it's, yeah, if you've raised a couple 100 million dollars and your user experience is trash, I'm going to call that out.But if this is something where you just soft launched, yeah, I'm not going to be a jerk about weird usability bugs here. I might call it out as “Ooh, this is an area for improvement,” but not, “What jackwagon thought of this?” I am trying to be a kinder, gentler Corey in the new year. But at the same time, I also want to be very clear that there's room for improvement on everything. What surprised me the most about this is how well you nailed the user experience despite not having a full team of people doing UX research.Maciej: That was definitely a priority. So, maybe a little bit of history. So, I started working on Cloudash, I think it was April… 2019. I think? Yeah. It's 2021 right now. Or we're 2022. [unintelligible 00:11:33].Corey: Yeah. 2022, now. I—Maciej: I'm sorry. [laugh].Corey: —I've been screwing that up every time I write the dates myself, I'm with you.Maciej: [laugh]. Okay, so I started working on Cloudash, in 2020, April 2020.Corey: There we go.Maciej: So, after eight months, I released some beta, like, free; you could download it from GitHub. Like, you can still download on GitHub, but at that time, there was no license, you didn't have to buy a license to run it. So, it was, like, very early, like, 0.3 version that was working, but sort of, like, [unintelligible 00:12:00] working. There were some bugs.And that was the first time that I tweeted about it on Twitter. It gets some attention, but, like, some people started using it. I get some feedback, very initial feedback. And I was like, every time I open Cloudash, I get the sense that, like, this is useful. I'm talking about my own tool, but like, [laugh] that's the thing.So, further in the history. So, I'm kind of service engineer by my own. I am a software engineer, I started focusing on serverless, in, like, 2015, 2016. I was working for Serverless Inc. as an early employee.I was then working as an engineering manager for a couple of companies. I work as an engineering manager right now at Stedi; we're also, like, fully serverless. So I, kind of, trying to fix my own issues with serverless, or trying to improve the whole experience around serverless in AWS. So, that's the main purpose why we're building Cloudash: Because we want to improve the experience. And one use case I'm often mentioning is that, let's say that you're kind of on duty. Like, so in the middle of night PagerDuty is calling you, so you need to figure out what's going on with your Lambda or API Gateway.Corey: Yes. PagerDuty, the original [Call of Duty: Nagios 00:13:04]. “It's two in the morning; who is it?” “It's PagerDuty. Wake up, jackass.” Yeah. We all had those moments.Maciej: Exactly. So, the PagerDuty is calling you and you're, kind of, in the middle of night, you're not sure what's going on. So, the kind of thing that we want to optimize is from waking up into understanding what's going on with your serverless stuff should be minimized. And that's the purpose of Cloudash as well. So, you should just run one tool, and you should immediately see what's going on. And that's the purpose.And probably with one or two clicks, you should see the logs responsible, for example, in your Lambda. Again, like that's exactly what we want to cover, that was the initial thing that we want to cover, to kind of minimize the time you spent on troubleshooting serverless apps. Because as we all know, kind of, the longer it's down, the less money you make, et cetera, et cetera, et cetera.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: One of the things that I appreciate about this is that I have something like five different microservices now that power my newsletter production pipeline every week. And periodically, I'll make a change and something breaks because testing is something that I should really get around to one of these days, but when I'm the only customer, cool. Doesn't really matter until suddenly I'm trying to write something and it doesn't work. Great. Time to go diving in, and always I'm never in my best frame of mind for that because I'm thinking about writing for humans not writing for computers. And that becomes a challenge.And okay, how do I get to the figuring out exactly what is broken this time? Regression testing: It really should be a thing more than it has been for me.Maciej: You should write those tests. [laugh].Corey: Yeah. And then I fire this up, and okay, great. Which sub-service is it? Great. Okay, what happened in the last five minutes on that service? Oh, okay, it says it failed successfully in the logs. Okay, that's on me. I can't really blame you for that. But all right.And then it's a matter of adding more [print or 00:14:54] debug statements, and understanding what the hell is going on, mostly that I'm bad at programming. And then it just sort of works from there. It's a lot easier to, I guess, to reason about this from my perspective than it is to go through the CloudWatch dashboards, where it's okay, here's a whole bunch of metrics on different graphs, most of which you don't actually care about—as opposed to unified view that you offer—and then “Oh, you want to look at logs, that's a whole separate sub-service. That's a different service team, obviously, so go open that up in another browser.” And I'm sitting here going, “I don't know who designed this, but are there any windows in their house? My God.”It's just the saddest thing I can possibly experience when I'm in the middle of trying to troubleshoot. Let's be clear, when I'm troubleshooting, I am in no mood to be charitable to anyone or anything, so that's probably unfair to those teams. But by the same token, it's intensely frustrating when I keep smacking into limitations that get in my way while I'm just trying to get the thing up and running again.Maciej: As you mentioned about UX that, like, we've spent a lot of time thinking about the UX, trying different approaches, trying to understand which metrics are the most important. And as we all know, kind of, serverless simplifies a lot of stuff, and there's, like, way less metrics that you need to look into when something is happening, but we want to make sure that the stuff that we show—which is duration errors, and p95—are probably the most important in most cases, so like, covering most of this stuff. So sorry, I didn't mention that before; it was very important from the very beginning. And also, like, literally, I spent a lot of time, like, working on the colors, which sounds funny, [laugh] but I wanted to get them right. We're not yet working on dark mode, but maybe soon.Anyways, the visual part, it's always close to my heart, so we spent a lot of time going back to what just said. So, definitely the experience around using CloudWatch right now, and CloudWatch logs, CloudWatch metrics, is not really tailored for any specific use case because they have to be generic, right? Because AWS has, like, I don't know, like, 300, or whatever number of services, probably half of them producing logs—maybe not half, maybe—Corey: We shouldn't name a number because they'll release five more between now and when this publishes in 20 minutes.Maciej: [laugh]. So, CloudWatch has to be generic. What we want to do with Cloudash is to take those generic tools—because we use, of course, CloudWatch logs, CloudWatch metrics, we fetch data from them—but make the visual part more tailored for specific use case—in our case, it's the serverless use case—and make sure that it's really, kind of—it shows only the stuff that you need to see, not everything else. So again, like that's the main purpose. And then one more thing, we—like this is also some kind of measurement of success, we want to reduce number of tabs that you need to have open in your browser when you're dealing with CloudWatch. So, we tried to put most important stuff in one view so you don't need to flip between tabs, as you usually do when try to under some kind of broader scope, or broader context of your, you know, error in Lambda.Corey: What inspired you to do this as a desktop application? Because a lot of companies are doing similar things, as SaaS, as webapps. And I have to—as someone who yourself—you're a self-described serverless engineer—it seems to me that building a webapp is sort of like the common description use case of a lot of serverless stuff. And you're sitting here saying, “Nope, it's desktop app time.” Which again, I'm super glad you did. It's exactly what I was looking for. How do you get here?Maciej: I'd been thinking about both kinds of types of apps. So like, definitely webapp was the initial idea how to build something, it was the webapp. Because as you said, like, that's the default mode. Like, we are thinking webapp; like, let's build a webapp because I'm an engineer, right? There is some inspiration coming from Dynobase, which was made by a friend [unintelligible 00:18:55] who also lives in Poland—I didn't mention that; we're based in [Poznań 00:18:58], Poland.And when I started thinking about it, there's a lot of benefits of using this approach. The biggest benefit, as I mentioned, is security; and the second benefit is just most, like, cost-effective because we don't need to run in the backend, right? We don't need to download all your metrics, all your logs. We I think, like, let's think about it, like, from the perspective. Listen, so everyone in the company to start working, they have to download all of your stuff from your AWS account. Like, that sounds insane because you don't need all of that stuff elsewhere.Corey: Store multiple copies of it. Yeah I, generally when I'm looking at this, I care about the last five to ten minutes.Maciej: Exactly.Corey: I don't—Maciej: Exactly.Corey: —really care what happened three-and-a-half years ago on this function. Almost always. But occasionally I want to look back at, “Oh, this has been breaking. How long has it been that way?” But I already have that in the AWS environment unless I've done the right thing and turned on, you know, log expiry.Maciej: Exactly. So, this is a lot of, like, I don't want to be, like, you know, mean to anyone but like, that's a lot of waste. Like, that's a lot of waste of compute power because you need to download it; of cost because you need to get this data out of AWS, which you need to pay for, you know, get metric data and stuff like this. So, you need to—Corey: And almost all of its—what is it? Write once, read never. Because it's, you don't generally look at these things.Maciej: Yeah, yeah. Exactly.Corey: And so much of this, too, for every invocation I have, even though it's low traffic stuff, it's the start with a request ID and what version is running, it tells me ‘latest.' Helpful. A single line of comment in this case says ‘200.' Why it says that, I couldn't tell you. And then it says ‘End request ID.' The end.Now, there's no way to turn that off unless you disabled the ability to write to CloudWatch logs in the function, but ingest on that cost 50 cents a gigabyte, so okay, I guess that's AWS's money-making scam of the year. Good for them. But there's so much of that, it's like looking at—like, when things are working, it's like looking at a low traffic site that's behind a load balancer, where there's a whole—you have gigabytes, in some cases, of load balancer—of web server logs on the thing that's sitting in your auto-scaling group. And those logs are just load balancer health checks. 98% of it is just that.Same type of problem here, I don't care about that, I don't want to pay to store it, I certainly don't want to pay to store it twice. I get it, that makes an awful lot of sense. It also makes your security job a hell of a lot easier because you're not sitting on a whole bunch of confidential data from other people. Because, “Well, it's just logs. What could possibly be confidential in there?” “Oh, my sweet summer child, have you seen some of the crap people put in logs?”Maciej: I've seen many things in logs. I don't want to mention them. But anyways—and also, you know, like, usually when you gave access to your AWS account, it can ruin you. You know, like, there might be a lot of—like, you need to really trust the company to give access to your AWS account. Of course, in most cases, the roles are scoped to, you know, only CloudWatch stuff, actions, et cetera, et cetera, but you know, like, there are some situations in which something may not be properly provisioned. And then you give access to everything.Corey: And you can get an awful lot of data you wouldn't necessarily want out of that stuff. Give me just the PDF printout of last month's bill for a lot of environments, and I can tell you disturbing levels of detail about what your architecture is, just because when you—you can infer an awful lot.Maciej: Yeah.Corey: Yeah, I hear you. It makes your security story super straightforward.Maciej: Yeah, exactly. So, I think just repeat my, like, the some inspiration. And then when I started thinking about Cloudash, like, definitely one of the inspiration was Dynobase, from the, kind of, GUI for, like, more powerful UI for DynamoDB. So, if you're interested in that stuff, you can also check this out.Corey: Oh, yeah, I've been a big fan of that, too. That'll be a separate discussion on a different episode, for sure.Maciej: [laugh]. Yeah.Corey: But looking at all of this, looking at the approach of, the only real concern—well, not even a concern. The only real challenge I have with it for my use case is that when I'm on the road, the only thing that I bring with me for a computer is my iPad Pro. I'm not suggesting by any means that you should build this as a new an iPad app; that strikes me as, like, 15 levels of obnoxious. But it does mean that sometimes I still have to go diving into the CloudWatch console when I'm not home. Which, you know, without this, without Cloudash, that's what I was doing originally anyway.Maciej: You're the only person that requested that. And we will put that into backlog, and we will get to that at some point. [laugh].Corey: No, no, no. Smart question is to offer me a specific enterprise tier pricing—.Maciej: Oh, okay. [laugh].Corey: —that is eye-poppingly high. It's like, “Hey, if you want a subsidize feature development, we're thrilled to empower that.” But—Maciej: [laugh]. Yeah, yeah. To be honest, I like that would be hard to write [unintelligible 00:23:33] implement as iPad app, or iPhone app, or whatever because then, like, what's the story behind? Like, how can I get the credentials, right? It's not possible.Corey: Yeah, you'd have to have some fun with that. There are a couple of ways I can think of offhand, but then that turns into a sandboxing issue, and it becomes something where you have to store credentials locally, regardless, even if they're ephemeral. And that's not great. Maybe turn it into a webapp someday or something. Who knows.What I also appreciate is that we had a conversation when you first launched, and I wound up basically going on a Zoom call with you and more or less tearing apart everything you've built—and ideally constructive way—but looking at a lot of the things you've changed in your website, you listened to an awful lot of feedback. You doubled your pricing, for example. Used to be ten bucks a month; now you're twenty. Great. I'm a big believer in charging more.You absolutely add that kind of value because it's, “Well, twenty bucks a month for a desktop app. That sounds crappy.” It's, “Yeah, jackwagon, what's your time worth?” I was spending seven bucks a month in serverless charges, and 120 or 130 a month for Epsagon, and I was thrilled to pieces to be doing it because the value I got from being able to quickly diagnose what the hell was going on far outstripped what the actual cost of doing these things. Don't fall into the trap of assuming that well, I shouldn't pay for software. I can just do it myself. Your time is never free. People think it is, but it's not.Maciej: That's true. The original price of $9.99, I think that was the price was the launch promo. After some time, we've decided—and after adding more features: API Gateway support—we've decided that this is, like, solving way more problems, so like, you should probably pay a little bit more for that. But you're kind of lucky because you subscribed to it when it was 9.99, and this will be your kind of prize for the end of, you know—Corey: Well, I'm going to argue with you after the show to raise the price on mine, just because it's true. It's the—you want to support the things that you want to exist in the world. I also like the fact that you offered an annual plan because I will go weeks without ever opening the app. And that doesn't mean it isn't adding value. It's that oh, yeah, I will need that now that I'm hitting these issues again.And if I'm paying on a monthly basis, and it shows up with a, “Oh, you got charged again.” “Well, I didn't use it this month; I should cancel.” And [unintelligible 00:25:44] to an awful lot of subscriber churn. But in the course of a year, if I don't have at least one instance in which case, wow, that ten minute span justified the entire $200 annual price tag, then, yeah, you built the wrong thing or it's not for me, but I can think of three incidents so far since I started using it in the past four months that have led to that being worth everything you will charge me a year, and then some, just because it made it so clear what was breaking.Maciej: So, in that regard, we are also thinking about the team licenses, that's definitely on the roadmap. There will be some changes to that. And we definitely working on more and more features. And if we're—like, the roadmap is mostly about supporting more and more AWS services, so right now it's Lambda, API Gateway, we're definitely thinking about SQS, SNS, to get some sense how your messages are going through, probably something, like, DynamoDB metrics. And this is all kind of serverless, but why not going wider? Like, why not going to Fargate? Like, Fargate is theoretically serverless, but you know, like, it's serverless on—Corey: It's serverless with a giant asterisk next to it.Maciej: Yeah, [laugh] exactly. So, but why not? Like, it's exactly the same thing in terms of, there is some user flow, there is some user journey, when you want to debug something. You want to go from API Gateway, maybe to the container to see, I don't know, like, DynamoDB metric or something like that, so it should be all easy. And this is definitely something.Later, why not EC2 metrics? Like, it would be a little bit harder. But I'm just saying, like, first thing here is that you are not, like, at this point, we are serverless, but once we cover serverless, why not going wider? Why not supporting more and more services and just making sure that all those use cases are correctly modeled with the UI and UX, et cetera?Corey: That's going to be an interesting challenge, just because that feels like what a lot of the SaaS monitoring and observability tooling is done. And then you fire this thing up, and it looks an awful lot like the AWS console. And it's, “Yeah, I just want to look at this one application that doesn't use any of the rest of those things.” Again, I have full faith and confidence in your ability to pull this off. You clearly have done that well based upon what we've seen so far. I just wonder how you're going to wind up tackling that challenge when you get there.Maciej: And maybe not EC2. Maybe I went too far. [laugh].Corey: Yeah, honestly, even EC2-land, it feels like that is more or less a solved problem. If you want to treat it as a bunch of EC2, you can use Nagios. It's fine.Maciej: Yeah, totally.Corey: There are tools that have solved that problem. But not much that I've seen has solved the serverless piece the way that I want it solved. You have.Maciej: So, it's definitely a long road to make sure that the serverless—and by serverless, I mean serverless how AWS understands serverless, so including Fargate, for example. So, there's a lot of stuff that we can improve. It's a lot of stuff that can make easier with Cloudash than it is with CloudWatch, just staying inside serverless, it will take us a lot of time to make sure that is all correct. And correctly modeled, correctly designed, et cetera. So yeah, I went too far with EC2 sorry.Corey: Exactly. That's okay. We all go too far with EC2, I assure you.Maciej: Sorry everyone using EC2 instances. [laugh].Corey: If people want to kick the tires on it, where can they find it?Maciej: They can find it on cloudash.dev.Corey: One D in the middle. That one throws me sometimes.Maciej: One D. Actually, after talking to you, we have a double-D domain as well, so we can also try ‘Clouddash' with double-D. [laugh].Corey: Excellent, excellent. Okay, that is fantastic. Because I keep trying to put the double-D in when I'm typing it in my search tool on my desktop, and it doesn't show up. And it's like, “What the—oh, right.” But yeah, we'll get there one of these days.Maciej: Only the domain. It's only the domain. You will be redirected to single-D.Corey: Exactly.Maciej: [laugh].Corey: We'll have to expand later; I'll finance the feature request there. It'll go well. If people want to learn more about what you have to think about these things, where else can they find you?Maciej: On Twitter, and my Twitter handle is @mthenw. M-then-W, which is M-T-H—mthenw. And my co-founder @tlakomy. You can probably add that to [show notes 00:29:35]. [laugh].Corey: Oh, I certainly will. It's fine, yeah. Here's a whole bunch of letters. I hear you. My Twitter handle used to be my amateur radio callsign. It turns out most people don't think like that. And yeah, it's become an iterative learning process. Thank you so much for taking the time to speak with me today and for building this thing. I really appreciate both of them.Maciej: Thank you for having me here. I encourage everyone to visit cloudash.dev, if you have any feature requests, any questions just send us an email at hello@cloudash.dev, or just go to GitHub repository in the issues; just create an issue, describe what you want and we can talk about it.We are always happy to help. The main purpose, the ultimate goal of Cloudash is to make the serverless engineer's life easier, on very high level. And on a little bit lower level, just to make, you know, troubleshooting and debugging serverless apps easier.Corey: Well, from my perspective, you've succeeded.Maciej: Thank you.Corey: Thank you. Maciej Winnicki, founder of Cloudash. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment telling me exactly why I'm wrong for using an iPad do these things, but not being able to send it because you didn't find a good way to store the credentials.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About JakeTechnical Lead by day at the Met Office in the UK, leading a team of software developers delivering services for the UK. By night, gamer and fitness instructor, attempting to get a home cinema and gaming setup whilst coralling 3 cats, 2 rabbits, 2 fish tanks, and my wonderful girlfriend.Links: Met Office: https://www.metoffice.gov.uk Twitter: https://twitter.com/jakehendy TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: It seems like there is a new security breach every day. Are you confident that an old SSH key, or a shared admin account, isn't going to come back and bite you? If not, check out Teleport. Teleport is the easiest, most secure way to access all of your infrastructure. The open source Teleport Access Plane consolidates everything you need for secure access to your Linux and Windows servers—and I assure you there is no third option there. Kubernetes clusters, databases, and internal applications like AWS Management Console, Yankins, GitLab, Grafana, Jupyter Notebooks, and more. Teleport's unique approach is not only more secure, it also improves developer productivity. To learn more visit: goteleport.com. And not, that is not me telling you to go away, it is: goteleport.com. Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database that is not the bind DNS server. If you're tired of managing open source Redis on your own, or you're using one of the vanilla cloud caching services, these folks have you covered with the go to manage Redis service for global caching and primary database capabilities; Redis Enterprise. To learn more and deploy not only a cache but a single operational data platform for one Redis experience, visit redis.com/hero. Thats r-e-d-i-s.com/hero. And my thanks to my friends at Redis for sponsoring my ridiculous non-sense. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It's often said that the sun never sets on the British Empire, but it's often very cloudy and hard to see the sun because many parts of it are dreary and overcast. Here to talk today about how we can predict those things in advance—in theory—is Jake Hendy, Tech Lead at the Met Office. Jake, thanks for joining me.Jake: Hey, Corey, it's lovely to be here. Thanks for inviting me on.Corey: There's a common misconception that its startups in San Francisco or the culture thereof, if you can even elevate it to being a culture above something you'd find in a petri dish, that is where cloud stuff happens, where the computer stuff is done. And I've always liked cutting against that. There are governments that are doing interesting things with Cloud; there are large companies and ‘move fast and break things' is the exact opposite of what you generally want from institutions that date back centuries. What's it like working on Cloud, something that for all intents and purposes didn't exist 20 years ago, in the context of a government office?Jake: As you can imagine, it was a bit of a foray into cloud for us when it first came around. We weren't one of the first people to jump. The Met Office, we've got our own data centers, which we've proudly sit on that contains supercomputers and mainframes as well as a plethora of x86 hardware. So, we didn't move fast at the start, but nowadays, we don't move at breakneck speeds, but we like to take advantage of those managed services. It gets out of the way of managing things for us.Corey: Let's back up a second because I tend to be stereotypically American in many ways. What is the Met Office?Jake: What is the Met Office? The Met Office is the UK's National Meteorological Service. And what does that mean? We do a lot of things though with meteorology, from weather forecasting and climate research from our Hadley Centre—which is world-renowned—down to observations, collections, and partnerships around the world. So, if you've been on a plane over Europe, the Middle East, Africa, over parts of Asia, that plane took off because the Met Office provided a forecast for that plane. There's a whole range of things we can talk about there, if you want Corey, of what the Met Office actually does.Corey: Well, let's ask some of the baseline questions. You think of a weather office in a particular country as, oh okay, it tracks the weather in the area of operations for that particular country. Are you looking at weather on a global basis, on a somewhat local basis, or—as mentioned—since due to a long many-century history it turns out that there are UK Commonwealth territories scattered around the globe, where do you start? Where do you stop?Jake: We don't start and we don't stop. The Met Office is very much a 24/7 operation. So, we've got a 24/7 operation center with staff constantly manning it, doing all sorts of things. So, we've got a defense, we work heavily with our defense colleagues from UK armed forces to NATO partners; we've got aviation, as mentioned; we've got marine shipping from—most of the listeners in the UK will have heard of the shipping forecast at one point or another. And we've got private sector as well, from transport, to energy, supermarkets, and more. We have a very heavy UK focus, for obvious reasons, but our remit goes wide. You can actually go and see some of our model data is actually on Amazon Open Data. We've got MOGREPS, which is our ensemble forecast, as well as global models and UK models, with a 24-hour time lag, but feel free to go and have a play. And you can see the wide variety of data that we produce in just those few models.Corey: Yeah, just pulling up your website now; looking at where I am here in San Francisco, it gives me a detailed hour-by-hour forecast. There are only two problems I see with it. The first is that it's using Celsius units, which I—Jake: [laugh].Corey: —as a matter of policy, don't believe in because in this country, we don't really use things that make sense in measuring context. And also, I don't believe it's a real weather site because it's not absolutely festooned with advertisements for nonsense, which is apparently—I wasn't aware—a thing that you could have on the internet. I thought that showing weather data automatically meant that you had to attempt to cater to the lowest common denominator at all times.Jake: That's an interesting point there. So, the Met Office is owned and operated by Her Majesty's Government. We are a Trading Fund with the Department for Business, Energy and Industrial Strategy. But what does that mean it's a Trading Fund?k it means that we're funded by public money. So, that's called the Public Weather Service.But we also offer a more commercial venture. So, depending on what extensions you've got going on in your browser, there are actually adverts that do run on our website, and we do this to help recover some of the cost. So, the Public Weather Service has to recover some of that. And then lots of things are funded by the Public Weather Service, from observations, to public forecasting. But then there are more those commercial ventures such as the energy markets that have more paid products, and things like that as well. So, maybe not that many adverts, but definitely more usable.Corey: Yeah, I disabled the ad blocker, and I'm reloading it and I'm not seeing any here. Maybe I'm just considered to be such a poor ad targeting prospect at this point that people have just given up in despair. Honestly, people giving up on me in despair is kind of my entire shtick.Jake: We focus heavily on user-centered design, so I was fortunate in their previous team to work in our digital area, consumer digital, which looked after our web and mobile channels. And I can heartily say that there are a lot of changes, had a lot of heavy research into them. Not just internal, getting [unintelligible 00:06:09] and having a look at it, but what does this is actually mean for members of the? Public sending people out doing guerrilla public testing, standing outside Tescos—which is one of our large superstores here—and saying, “Hey, what do you think of this?” And then you'd get a variety of opinions, and then features would be adjusted, tweaked, and so on.Corey: So, you folks have been a relatively early adopter, especially in an institutional context. And by institution, I mean, one of those things that feels like it is as permanent as the stones in a castle, on some level, something that's lasted more than 20 years here in California, what a concept. And part of me wonders, were you one of the first UK government offices to use the cloud, and is that because you do weather and someone was very confused by what Cloud meant?Jake: [laugh]. I think we were possibly one of the first; I couldn't say if we were the first. Over in the UK, we've got a very capable network of government agencies doing some wonderful, and very cloud things. And the Government Digital Service was an initiative set up—uh, I can't remember, and I—unfortunately I can't remember the name of the report that caused its creation, but they had a big hand in doing design and cloud-first deployments. In the Met Office, we didn't take a, “Ah, screw it. Let's jump in,” we took a measured step into the cloud waters.Like I said, we've been running supercomputers since the '50s, and mainframes as well, and x86. I mean, we've been around for 100 years, so we constantly adapt, and engage, and iterate, and improve. But we don't just jump in and take a risk because like you said, we are an institution; we have to provide services for the public. It's not something that you can just ignore. These are services that protect life and property, both at home and abroad.Corey: You have provided a case study historically to AWS, about your use cases of what you use, back in 2014. It was, oh, you're a heavy user of EC2, and looking at the clock, and oh, it's 2014. Surprise. But you've also focused on other services as well. I believe you personally provided a bit of a case study slash story of round your use of Pinpoint of all things, which is a wrapper around SES, their email service, in the hopes of making it a little bit more, I guess, understandable slash fully-featured for contacting people, but in my experience is a great sales device to drive business to its competitors.What's it been like working, I guess, both simultaneously with the tried and true, tested yadda, yadda, yadda, EC2 RDS style stuff, but then looking at what else you're deep into Lambda, and DynamoDB, and SQS sort of stands between both worlds give it was the first service in beta, but it also is a very modern way of thinking about services. How do you contextualize all of that? Because AWS has product strategies, clearly, “Yes.” And they build anything for anyone is more or less what it seems. How do you think about the ecosystem of services that are available and apply it to problems that you're working on?Jake: So, in my personal opinion, I think the Met Office is one of a very small handfuls of companies around the world that could use every Amazon service that's offered, even things like Ground Station. But on my first day in the office, I went and sat at my desk and was talking to my new colleagues, and I looked to the left and he said, “Oh, yeah, that's a satellite dish collecting data from a satellite passing overhead.” So, we very much pick the best tool for the job. So, we have systems which do heavy number crunching, and very intense things, we'll go for EC2.We have systems that store data that needs relationships and all sorts of things. Fine, we'll go RDS. In my space, we have over a billion observations a year coming through the system I lead on SurfaceNet. So, do we need RDS? No. What about if we use something like S3 and Glue and Athena to run queries against this?We're very fortunate that we can pick the best tool for the job, and we pride ourselves on getting the most out of our tools and getting the most value for money. Because like I said, we're funded by the taxpayer; the taxpayer wants value for money, and we are taxpayers ourselves. We don't want to see our money being wasted when we got a hundred size auto-scaling group, when we could do it with Lambda instead.Corey: It's fascinating talking about some of the forward-looking stuff, and oh, serverless and throw everything at Cloud and be all in on cloud. Cloud, cloud, cloud. Cloud is the future. But earlier this year, there was a press release where the Met Office and Microsoft are going to be joining forces to build the world's, and I quote, “Most powerful weather and climate forecasting supercomputer.” The government—your government, to be clear—is investing over a billion pounds in the project.It is slated to be online and running by the middle of next year, 2022, which for a government project as I contextualize them feels like it's underwear-on-outside-the-pants superhero speed. But that, I guess, is what happens when you start looking at these public-private partnerships in some respects. How do you contextualize that? What is the story behind, oh, we're—you're clearly investing heavily in cloud, but you're also building your own custom enormous supercomputer rather than just waiting for AWS to drop one at re:Invent. What is the decision-making process look like? What is the strategy behind it?Jake: Oh. [laugh]. So—I'll have to be careful here—supercomputing is something that we've been doing for a long time, since the '50s, and we've grown with that. When the Met Office moved offices from Bracknell in 2002, 2003, we run two supercomputers for operational resilience, at that point [unintelligible 00:12:06] building in the new building; it was ready, and they were like, “Okay, let's move a supercomputer.” So, it came hurtling down the motorway, plugged in, and congrats, we've now got two supercomputers running again. We're very fortunate—Corey: We had one. It got lonely. We wanted to make it a friend. Yeah, I get it.Jake: Yeah. It's long distance; it works. And the Met Office is actually very good at running projects. We've done many supercomputers over the years, and supercomputing our models, we run some very intense models, and we have more demands. We know we can do better.We know there's the observations in my group we collect, there's the science that's continually improving and iterating and getting better, and our limit isn't poor optimizations or poorly written code. They're scientists running some fantastic code; we have a team who go and optimize these models, and you know, in one release, they may knock down a model runtime by four minutes. And you think, okay, that's four minutes, but for example, if that's four minutes across 400 nodes, all of a sudden you've now got 400 nodes that have then got four minutes more of compute. That could be more research, that could be a different model run. You know, we're very good at running these things, and we're very fortunate with very technically capable to understand the difference between a workload that belongs on AWS, a workload that belongs on a supercomputer.And you know, a supercomputer has many benefits, which the cloud providers… are getting into, you know, we have a high performance clusters on Amazon and Azure, or with, you know, InfiniBand networking. But sometimes you really can't beat a hunking great big ton of metal and super water-cooling, sat in a data center somewhere, backed by—we're very fortunate to have one hundred percent renewable energy for the supercomputer, which is—if you look at any of the power requirements for a supercomputer is phenomenal, so we're throwing that credentials behind it for climate change as well. You can't beat a supercomputer sometimes.Corey: This episode is sponsored by our friends at Oracle HeatWave is a new high-performance accelerator for the Oracle MySQL Database Service. Although I insist on calling it “my squirrel.” While MySQL has long been the worlds most popular open source database, shifting from transacting to analytics required way too much overhead and, ya know, work. With HeatWave you can run your OLTP and OLAP, don't ask me to ever say those acronyms again, workloads directly from your MySQL database and eliminate the time consuming data movement and integration work, while also performing 1100X faster than Amazon Aurora, and 2.5X faster than Amazon Redshift, at a third of the cost. My thanks again to Oracle Cloud for sponsoring this ridiculous nonsense. Corey: I'm somewhat fortunate in the despite living in a world of web apps, these days, my business partner used to work at the Department of Energy at Oak Ridge National Lab, helping with the care and feeding of the supercomputer clusters that they had out there. And you're absolutely right; that matches my understanding with the idea that there are certain workloads you're not going to be able to beat just having this enormous purpose-built cluster sitting there ready to go. Or even if you can, certainly not economically. I have friends who are in the batch side of the world, the HPC side of the world over in the AWS organizations, and they keep—“Hey, look at this. This thing's amazing.”But so much of what they're talking about seems to distill down to, “I have this one-off giant compute task that needs to get done.” Yes, you're right. If I need to calculate the weather one time, then okay, I can make an argument for going with cloud but you're doing this on what appears to be a pretty consistent basis. You're not just assuming—as best I can tell that, “And starting next Wednesday, it will be sunny forever. The end.”Jake: I'm sure many people would love it if we could do weather on-demand.Corey: Oh, yes. [unintelligible 00:15:09] going to reserved instance weather. That would be great. Like, “All right. I'd like to schedule some rain, please.” It really seems like it's one of those areas that is one of the most commonly accepted in science fiction without any real understanding of just what it would take to do something like that. Even understanding and predicting the weather is something that is beyond an awful lot of our current capabilities.Jake: This is exactly it. So, the Met Office is world-renowned for its research capabilities and those really in-depth, very powerful models that we run. So, I mentioned earlier, something called MOGREPS, which is the Met Office's ensemble-based models. And what do we mean by ensembles? You may see in the documentation it's got 18 members.What does that mean? It means that we actually run a simulation 18 times, and we tweak the starting parameters based on these real world inputs. And then you have a number of members that iterate through and supercomputer runs all of them. And we have deterministic models, which have one set of inputs. And you know, it's not just, as you say, one time; these models must run.There are a number of models we do, models on sea state as well, and they've all got to run, so we generally tend to run our supercomputers at top capacity. It's not often you get to go on a supercomputer and there'll be some space for your job to execute right this minute. And there's all the setup as well, so it's not just okay, the supercomputer is ready to go, but there's all the things that go into it, like, those observations, whether it's from the surface, whether it's from satellite data passing overhead, we have our own lightning network, as well. We have many things, like a radar network that we own, and operate. We collaborate with the environment agency for rainfall. And all these things they feed into these models.Okay, now we produce a model, and now it's got to go out. So, it's got to come off the supercomputer, it's got to be processed, maybe the grid that we run the models on needs to be reprojected because different people feed maps in different ways. Then there's got to be cut up because not every customer wants to know what the weather is everywhere. They've got a bit they care about. And of course, these models aren't small; you know, they can be terabytes, so there's also a case of customers might not want to download terabytes; that might cost them a lot. They might only be able to process gigabytes an hour.But then there's other products that we do processing on, so weather models, it might take 40 minutes to over an hour for a model to run. Okay, that's great. You might have missed the first step. Okay, well, we can enrich it with other data that's come in, things like nowcasting, where we do very short runs for the next six-hour forecast. There's a whole number of things that run in the office. And we don't have a choice; they run operationally 24/7, around the clock.I mentioned to you before we started recording, we had an incident of ‘Beast from the East' a number of years back. Some of your listeners may remember this; in the UK, we had a front come in from the east and the UK was blanketed with snow. It was a real severe event. We pretty much kept most of our services running. We worked really hard to make sure that they continued working.And personally I say, perhaps when you go shopping for Black Friday, you might go to a retailer and it's got a queue system up because, you know, it mimics that queue thing when you're outside a store, like in Times Square, and it's raining, be like oh, I might get a deal a minute. I think possibly in the Met Office, we have almost the inverse problem. If the weather's benign, we're still there. People rely on us to go, “Yeah, okay. I can go out and have fun.” When the weather's bad, we don't have a choice. We have to be there because everybody wants us to be there, but we need to be there. It's not a case of this is an optional service.Corey: People often forget that yeah, we are living in a world in which, especially with climate change doing what it's doing, if you get this wrong, people can very easily die. That is not something to take lightly. It's not just about can I go outside and play a pickup game of basketball today?Jake: Exactly. So, you know, operationally, we have something called the National Severe Weather Warning Service, where we issue guidance and alerts across the UK, based on severe weather. And there's a number of different weather types that we issued guidance for. And the severity of that goes from yellow to amber to red. And these are manually generated products, so there's the chief meteorologist who's on shift, and he approves these.And these warnings don't just go out to the members of the public. They go out to Cabinet Office, they go out to first responders, they go out to a number of people who are interested in the weather and have a responsibility. But the other side is that we don't issue a weather warning willy-nilly. It's a measured, calculated decision by our very capable operations team. And once that weather system has passed, the weather story has changed, we'll review it. We go back and we say what could we have done differently?Could the models have predicted this earlier? Could we have new data which would have picked up on this? Some of our next generation products that are in beta, would they have spotted this earlier? There's a lot of service review that continually goes on because like I said, we are the best, and we need to stay the best. People rely on us.Corey: So, here's a question that probably betrays my own ignorance, and that's okay, that's what I'm here to do. When I was a kid, I distinctly remember—first, this is not the era wish the world was black and white; I'm a child of the '80s, let's be clear here, so this is not old-timey nonsense quite as much, but distinctly remember that it was a running gag how unreliable the weather report always was, and it was a bit hit or miss, like, “Well, the paper says it's going to be sunny today, but we're going to pack an umbrella because we know how this works.” It feels, and I could be way off base on this, but it really feels like weather forecasting has gotten significantly more accurate since I was a kid. Is that just nostalgia, and I remember my parents complaining about it, or has there been a qualitative improvement in the accuracy of weather forecasting?Jake: I wish I could tell you all the scientific improvements that we've made, but there's many groups of scientists in the office who I would more than happily shift that responsibility over to, but quite simply, yes. We have a lot of partners we work with around the world—the National Weather Service, DWD in Germany, Meteo France, just to name but a few; there are many—and we all collaborate with data. We all iterate. You know, the American Meteorological Society holds a conference every year, which we attend. And there have been absolutely leaping changes in forecast quality and accuracy over the years.And that's why we continually upgrade our supercomputers. Like I said, yeah, there's research and stuff, but we're pulling in all this science and Meteorology is generally very chaotic systems. We're still discovering many things around how the climate works and how the weather systems work. And we're going to use them to help improve quality of life, early warnings, actually, we can say, oh, in three days time, it's going to be sunny at the beach. Be great if you could know that seven days in advance. It would be great if you knew that 14 days in advance.I mean, we might not do that because at the moment, we might have an idea, but there's also the case of understanding, you know, it's a probability-based decision. And people say, “Oh, it's not going to rain.” But actually, it's a case of, well, we said there's a 20% probability is going to rain. That doesn't mean it's not going to, but it's saying, “Two times out of ten, at this time it's going to rain.” But of course, if you go out 14 days, that's a long lead time, and you know, you talk about chaos theory, and the butterfly moves and flaps its wings, and all of a sudden a [cake 00:22:50] changes color from green to pink or something like that, some other location in the world.These are real systems that have real impacts, so we have to balance out the science of pure numbers, but what do people do with it? And what can people do with it, as well? So, that's why we talk about having timely data as well. People say, “Well, you could run these simulations and all your products take longer to process them and generate them,” but for example, in SurfaceNet, we have five minutes to process an observation once it comes in. We could spend hours fine-tuning that observation to make it perfect, but it needs to be useful.Corey: As you take a look throughout all of the things that AWS is doing—and sure, not all of these are going to necessarily apply directly to empowering the accuracy of weather forecasts, let's be clear here—but you have expressed personal interest in for example, IoT, a bunch of the serverless nonsense we're seeing out there. What excites you the most? What has you the most enthusiastic about what the future the cloud might hold? Because unlike almost everyone else I talk to in this space, you are not selling anything. You don't have a position—that I'm aware of—that oh, yeah, I super want to see this particular thing win the industry because that means you get to buy a boat.You work for the Met Office; you know that in some cases, oh, that boat is not going to have a great time in that part of the world anyway. I don't need one. So, you're a little bit more objective than most people. I have pushing a corporate story. What excites you? Where do you see the future of this industry going in ways that are neat?Jake: Different parts of the office will tell you different things, you know. We worked with Google DeepMind on AI and machine learning. We work with many partners on AI and machine learning, we use it internally, as well. On a personal level, I like quality of life improvements and things that just make my life as both the developer fun and interesting. So, CDK was a big thing.I was a CloudFormation wizard—still hate writing YAML—but the CDK came along and it was [unintelligible 00:24:52] people wouldn't say, but that wasn't, like, know when Lambda launched back in, what, 2013? 2014? No, but it made our lives easier. It meant that actually, we didn't have to worry about, okay, how do we do templating with YAML? Do we have to run some pre-processes or something?It meant that we could invest a little bit of time upfront on CDK and migrating everything over, and then that freed us up to actually doing things that we need for what we call the business or the organization, delivering value, you know? It's great playing with tech but, you know, I need to deliver value. And I think, what was it, in the Google SRE book, they limit the things they do, toiling of manual tasks that don't really contribute anything, they're more like keeping the lights on. Let's get rid of that. Let's focus on delivering value.It's why Lambda is so great. I could patch an EC2, I can automate it, you know, you got AWS Systems Manager Patch Manager, or… whatever its name is, they can go and manage all those patches for you. Why when I can do it in a Lambda and I don't need to worry about it?Corey: So, one last question that I have for you is that you're a tech lead. It's easy for folks to fall into the trap of assuming, “Oh, you're a government. It's like an enterprise only bigger, slower, and way, way, way busier.” How many hundreds of thousands of engineers are working at the Met Office along with you?Jake: So, you can have a look at our public report and you can see the number of staff we have. I think there's about 1800 staff that work at the Met Office. And that includes our account manage, that includes our scientists, that includes HR and legal. And I'd say there's probably less than 300 people who work in technology, as we call it, which is managing our IT estate, managing our Linux estate, managing our storage area networks because, funnily enough, managing petabytes of data is not an easy thing. You know, managing a supercomputer, a mainframe.There really aren't that many people here at the office, but we do so much great stuff. So, as a technical lead, I'm not just a leader of services, but I lead a team of people. I'm responsible for them, for empowering them, and helping them to develop their own careers and their own training. So, it's me and a team of four that look after SurfaceNet. And it's not just SurfaceNet; we've got other systems we look after that SurfaceNet produces data for. Sending messages around the world on the World Meteorological Organization's global telecommunications system. What a mouthful. But you know, these messages go all around the world. And some people might say, “Well, I got a huge team for that.” Well, [unintelligible 00:27:27]. We have other teams that help us—I say, help us—in their own right, they transmit that data. But we're really—I personally wouldn't say we were huge, but boy, do we pack a punch.Corey: Can I just say on a personal note, it's so great to talk to someone who's focusing on building out these environments and solving these problems for a higher purpose slash calling than—and I will get letters for this—than showing ads to people on the internet. I really want to thank you for taking time out of your day to speak with me. If people want to learn more about what you're up to, how you do it, potentially consider maybe joining you if they are eligible to work at the Met Office, where can they find you?Jake: Yeah, so you do have to be a resident in the UK, but www.metoffice.gov.uk is our home on the internet. You can find me on Twitter at @jakehendy, and I could absolutely chew Corey's ear off for many more hours about many of the wonderful services that the Met Office provides. But I can tell he's got something more interesting to do. So, uh [crosstalk 00:28:29]—Corey: Oh, you'd be surprised. It's loads of fun to—no, it's always fun to talk to people who are just in different areas that I don't get to work with very often. It turns out that most of my customers are not focused on telling you what the weather is going to do. And that's fine; it takes all kinds. It's just neat to have this conversation with a different area of the industry. Thank you so much for being so generous with your time. I appreciate it.Jake: Thank you very much for inviting me on. I guess if we get some good feedback, I'll have to come on and I will have to chew your ear off after all.Corey: Don't offer if you're not serious.Jake: Oh, I am.Corey: Jake Hendy, Tech Lead at the Met Office. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment yelling at one or both of us for having the temerity to rain on your parade.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Links: Cyber-security insurance providers are increasing their requirements to be insurable: https://Twitter.com/SwiftOnSecurity/status/1467879429707866112 “Why the C-suite doesn't need access to all corporate data”: https://www.darkreading.com/vulnerabilities-threats/why-the-c-suite-doesn-t-need-access-to-all-corporate-data “Amazon S3 Object Ownership can now disable access control lists to simplify access management for data in S3”: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-s3-object-ownership-simplify-access-management-data-s3/ Cloud provider security mistakes: https://github.com/SummitRoute/csp_security_mistakes TranscriptCorey: This is the AWS Morning Brief: Security Edition. AWS is fond of saying security is job zero. That means it's nobody in particular's job, which means it falls to the rest of us. Just the news you need to know, none of the fluff.Corey: Are you building cloud applications with a distributed team? Check out Teleport, an open-source identity-aware access proxy for cloud resources. Teleport provides secure access for anything running somewhere behind NAT: SSH servers, Kubernetes clusters, internal web apps, and databases. Teleport gives engineers superpowers. Get access to everything via single sign-on with multi-factor. List and see all of SSH servers, Kubernetes clusters, or databases available to you in one place, and get instant access to them using tools you already have. Teleport ensures best security practices like role-based access, preventing data exfiltration, providing visibility, and ensuring compliance. And best of all, Teleport is open-source and a pleasure to use. Download Teleport at goteleport.com. That's goteleport.com.Corey: re:Invent has come and gone, and with it remarkably few security announcements. Shockingly, it was a slow week for the industry. I'm glad but also disappointed to be proven wrong in my, “The only thing you, as a company who isn't AWS, should be announcing during re:Invent is your data breach since nobody will be paying attention,” snark. But it's for the best. It means that maybe—maybe—we're starting to see things normalize a bit.Now, from the Community, we saw some interesting stuff. Scuttlebutt has it that cyber-security insurance providers are increasing their requirements to be insurable. This makes a lot of sense; as ransomware attacks become more numerous, nobody is going to want to cut large insurance checks to folks who didn't think to have offline backups. You might want to check the specific terms and conditions of your policy.I also liked a writeup as to “Why the C-suite doesn't need access to all corporate data.” It's true, but it's super hard to defend against. When the CTO ‘requests' access to the AWS root account, who's likely to say no? If you're going to push for proper separation of duties, either do it the right way or don't even bother.Corey: This episode is sponsored in part by my friends at Cloud Academy. Something special for you folks: if you missed their offer on Black Friday or Cyber Monday or whatever day of the week doing sales it is, good news, they've opened up their Black Friday promotion for a very limited time. Same deal: $100 off a yearly plan, 249 bucks a year for the highest quality cloud and tech skills content. Nobody else is going to get this, and you have to act now because they have assured me this is not going to last for much longer. Go to cloudacademy.com, hit the ‘Start Free Trial' button on the homepage and use the promo code, ‘CLOUD' when checking out. That's C-L-O-U-D. Like loud—what I am—with a C in front of it. They've got a free trial, too, so you'll get seven days to try it out to make sure it really is a good fit. You've got nothing to lose except your ignorance about cloud. My thanks to Cloud Academy once again for sponsoring my ridiculous nonsense.Corey: And from AWS, there was really one glaring announcement that made me happy in the security context, and that was that “Amazon S3 Object Ownership can now disable access control lists to simplify access management for data in S3,” and it's huge. S3 ACLs have been a pain in everyone's side for years. Remember that S3 was the first AWS service to general availability, and a second in beta, after SQS. Meanwhile, IAM wasn't released until 2010. “Ignore bucket ACLs so you don't have to think about them” is a huge step towards normalizing security within AWS, specifically S3.And from the community's tools—I guess it's not a tool so much as it is a tip or I don't even know how you would describe it but I love it because Scott Piper is doing the lord's work by curating a list of cloud provider security mistakes. Lord knows that none of them are going to be showcasing their own failures, or—thankfully—those of their competition because I don't want to get in the middle of that mudslinging prize. This is well worth checking out and taking a look at, particularly when one provider or another starts getting a little too full of themselves around what they're doing in security. That's what happened last week in AWS security. Thank you for listening.Corey: Thank you for listening to the AWS Morning Brief: Security Edition with the latest in AWS security that actually matters. Please follow AWS Morning Brief on Apple Podcast, Spotify, Overcast—or wherever the hell it is you find the dulcet tones of my voice—and be sure to sign up for the Last Week in AWS newsletter at lastweekinaws.com.Announcer: This has been a HumblePod production. Stay humble.
Outline**Key Book Facts:*** (8 chapters: 30 pages/chapter & 240-250 total length)* Each chapter has one more more independent code examples in Github* Chapter 1: Getting started with .NET on AWS * What is Cloud Computing * Types of Cloud Computing: * IaaS * PaaS, FaaS and Serverless * SaaS * MaaS * Key Cloud Computing Concepts * Elastic Infrastructure * Overview of Core Services * High-level overview of AWS * History of AWS * Global Infrastructure * Using AWS * Setting up an account * Using AWS Console * Setting up and Using IAM * Setting up and Developing AWS C# SDK with: * Quickstart of cross-platform C# app * AWS Cloudshell * AWS Cloud9 * Visual Studio * Visual Studio Code and Visual Studio Codespaces on Github* Chapter 2: AWS Core Services * AWS Storage * Overview of AWS Storage * Developing with S3 Storage * Developing with EBS Storage * Using EFS Storage * Using EC2 Compute * Overview of EC2 * Using EC2 * Using EC2 Instance Types * Using EC2 Purchase Options * Security Best Practices for AWS * Encryption at REST and Transit * PLP (Principle of Least Privilege * Developing NoSQL Solutions with DynamoDB * What is DynamoDB * Key DynamoDB Concepts * Build a Sample C# DynamoDB Console App* Chapter 3: Migrating a legacy .NET application to AWS * Choosing a migration path * Rehosting * Replatforming * Repurchasing * Refactoring * Retire * Retain * Rehosting .NET Framework * App2Container * Rehosting .NET Core / 5 * .NET Core Elastic Beanstalk * Replatforming: Migrating .NET Framework * Considerations for moving to .NET 5 * Microsoft .NET Upgrade Assistant * AWS Porting Assistant for .NET * Migrating Build and Deploy to AWS * Teamcity to AWS Code Build * Selecting Deploy Compute Target Environment* Chapter 4: Modernizing .NET applications to Serverless * What is “Serverless” Computing? * Choosing the correct Serverless components for .NET on AWS * Developing with AWS Lambda and C# * Developing with AWS Step Functions * Developing with services with SQS and SNS * Developing Event Driven via AWS Triggers * Developing Serverless .NET Microservices on AWS * What is a Microservice according to AWS? * Overview of AWS Microservice options * Develop RESTful API with AWS App Runner * Developing RESTful API with AWS Lambda, API Gateway and SAM* Chapter 5: Containerization of .NET * Developing with Containers on AWS * Introduction to Containers * Comparing Containers to Hardware Virtualization * Advantages of Containers * Building Microservices with Containers * Introduction to Kubernetes * What is Kubernetes? * Understanding Kubernetes on AWS * Developing with AWS Container Compatible Services * Amazon ECR * Amazon ECS and Fargate * Amazon EKS * AWS App Runner * AWS Lambda* Chapter 6: DevOps * Getting started with DevOps on AWS? * What is DevOps? * What are AWS DevOps best practices * Developing with CI/CD * AWS Code Build * AWS Code Pipeline * Integrating 3rd party build servers * Jenkins * Teamcity * Github Actions * Developing with IAC * What is IAC? * Developing with Amazon CDK for IAC * What is CDK? * Working with CDK in C# * Developing with Terraform for IAC* Chapter 7: Monitoring, Instrumentation and Auditing and Testing for .NET * Using AWS Cloudwatch * Alarms, Logs, Metrics * Application monitoring * ServiceLens * Traces, Resource Health and Synthetic Canaries * Enabling SDK Metrics and Additional Tools * Using AWS Cloudtrail for Security Auditing * Continuous Delivery Key Concepts for .NET on SDK* Chapter 8: Developing with AWS C# SDK * Using AWS Toolkit for Visual Studio in Depth * Configuring Visual Studio for AWS Toolkit * Special Features of Visual Studio for AWS Toolkit * Key SDK Features * Async APIs * Retries and Timeouts * Paginators * Working with High-level AWS Services * Using AWS Rekognition * Using AWS Comprehend * Using AWS SagemakerIf you enjoyed this video, here are additional resources to look at:Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scalePython, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-dukeO'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZPragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5QView content on noahgift.com: https://noahgift.com/View content on Pragmatic AI Labs Website: https://paiml.com/
Добрый день уважаемые слушатели. Представляем новый выпуск подкаста RWpod. В этом выпуске: Ruby Rails 7 introduces partial_inserts config for ActiveRecord Kubing Rails: stressless Kubernetes deployments with Kuby How Lazy Evaluation Works in Ruby Programmers should stop celebrating incompetence Lambdakiq - ActiveJob on SQS and Lambda AutoHtml - a collection of filters that transforms plain text into HTML code RSyntaxTree - yet another syntax tree generator made with Ruby and RMagick ASMREPL - a REPL for assembly language Refactoring Javascript with Stimulus Values API & Defaults (video) Web Web Almanac (2021) React state management in 2022 — Return of the Redux 4x smaller, 50x faster CSS Concepts - The one and only guide you need Floating UI - JavaScript positioning library for tooltips, popovers, dropdowns, and more Mitosis - write components once, run everywhere RAJI - Really Async JSON Interface: a non-blocking alternative to JSON.parse to keep web UIs responsive Ladda - a UI concept which merges loading indicators into the action that invoked them TEGA - a TypeScript library (also usable in JS) for programming and creating GameBoy ROM images that can be played in an emulator or on real hardware via a flashcart Chalk 5.0.0 - terminal string styling done right Open Source CSS Variables
In this episode, we cover:00:00:00 - Introduction 00:05:00 - Itiel's Background in Engineering00:08:25 - Improving Kubernetes Troubleshooting00:11:45 - Improving Team Collaboration 00:14:00 - OutroLinks: Komodor: https://komodor.com/ Twitter: https://twitter.com/Komodor_com TranscriptJason: Welcome back to another episode of Build Things On Purpose, a part of the Break Things On Purpose podcast where we talk with people who have built really cool software or systems. Today with us, we have Itiel Shwartz who is the CTO of a company called Komodor. Welcome to the show.Itiel: Thanks, happy to be here.Jason: If I go to Komodor's website it really talks about debugging Kubernetes, and as many of our listeners know Kubernetes and complex systems are a difficult thing. Talk to me a little bit more—tell me what Komodor is. What does it do for us?Itiel: Sure. So, I don't think I need to tell our listeners—your listeners that Kubernetes looks cool, it's very easy to get started, but once you're into it and you have a big company with complex, like, micros—it doesn't have to be big, even, like, medium-size complex system company where you're starting to hit a couple of walls or, like, issues when trying to troubleshoot Kubernetes.And that usually is due to the nature of Kubernetes which makes making complex systems very easy. Meaning you can deploy in multiple microservices, multiple dependencies, and everything looks like a very simple YAML file. But in the end of the day, when you have an issue, when one of the pods is starting to restart and you try to figure out, like, why the hell is my application is not running as it should have, you need to use a lot of different tools, methodologies, knowledge that most people don't really have in order to solve the issue. So, Komodor focus on making the troubleshooting in Kubernetes an easy and maybe—may I dare say even fun experience by harnessing our knowledge in Kubernetes and align our users to get that digest view of the world.And so usually when you speak about troubleshooting, the first thing that come to mind is issues are caused due to changes. And the change might be deploying Kubernetes, it can be a [configurment 00:02:50] that changed, a secret that changed, or even some feature flag, or, like, LaunchDarkly feature that was just turned on and off. So, what Komodor does is we track and we collect all of the changes that happen across your entire system, and we put, like, for each one of your services a [unintelligible 00:03:06] that includes how did the service change over time and how did it behave? I mean, was it healthy? Was it unhealthy? Why wasn't it healthy?So, by collecting the data from all across your system, plus we are sit on top of Kubernetes so we know the state of each one of the pods running in your application, we give our users the ability to understand how did the system behave, and once they have an issue we allow them to understand what changes might have caused this. So, instead of bringing down dozens of different tools, trying to build your own mental picture of how the world looks like, you just go into Komodor and see everything in one place.I would say that even more than that, once you have an issue, we try to give our best efforts on helping to understand why did it happen. We know Kubernetes, we saw a lot of issues in Kubernetes. We don't try complex AI solution or something like that, but using our very deep knowledge of Kubernetes, we give our users, FYI, your pods that are unhealthy, but the node that they are running on just got restarted or is having this pressure.So, maybe they could look at the node. Like, don't drill down into the pods logs, but instead, go look at the nodes. You just upgraded your Kubernetes version or things like that. So, basically we give you everything you need in order to troubleshoot an issue in Kubernetes, and we give it to you in a very nice and informative way. So, our user just spend less time troubleshooting and more time developing features.Jason: That sounds really extremely useful, at least from my experience, in operating things on Kubernetes. I'm guessing that this all stemmed from your own experience. You're not typically a business guy, you're an engineer. And so it sounds like you were maybe scratching your own itch. Tell us a little bit more about your history and experience with this?Itiel: I started computer science, I started working for eBay and I was there in the infrastructure team. From there I joined two Israeli startup and—I learned that the thing that I really liked or do quite well is to troubleshoot issues. I was in a very, very, like, production-downtime-sensitive systems. A system when the system is down, it just cost the business a lot of money.So, in these kinds of systems, you try to respond really fast through the incidents, and you spend a lot of time monitoring the system so once an issue occur you can fix it as soon as possible. So, I developed a lot of internal tools. For the companies I worked for that did something very similar, allow you once you have an issue to understand the root cause, or at least to get a better understanding of how the world looks like in those companies.And we started Komodor because I also try to give advice to people. I really like Kubernetes. I liked it, like, a couple of years ago before it was that cool, and people just consult with me. And I saw the lack of knowledge and the lack of skills that most people that are running Kubernetes have, and I saw, like—I'd have to say it's like giving, like, a baby a gun.So, giving an operation person that doesn't really understand Kubernetes tell him, “Yeah, you can deploy everything and everything is a very simple YAML. You want a load balancer, it's easy. You want, like, a persistent storage, it's easy. Just install like—Helm install Postgres or something like that.” I installed quite a lot of, like, Helm-like recipes, GA, highly available. But things are not really highly available most of the time.So, it's definitely scratching my own itch. And my partner, Ben, is also a technical guy. He was in Google where they have a lot of Kubernetes experience. So, together both of us felt the pain. We saw that as more and more companies moved to Kubernetes, the pain became just stronger. And as the shift-left movement is also like taking off and we see more and more dev people that are not necessarily that technical that are expected to solve issues, then again we saw an issue.So, what we see is companies moving to Kubernetes and they don't have the skills or knowledge to troubleshoot Kubernetes. And then they tell their developers, “You are now responsible for the production. You are deploying? You should troubleshoot,” and the developers really don't know what to do. And we came to those companies and basically it makes everything a lot easier.You have any issue in Kubernetes? No issue, like, no issue. And no problem go to Komodor and understand what is the probable root cause. See what's the status? Like, when did it change? When was it last restarted? When was it unhealthy before today? Maybe, like, an hour ago, maybe a month ago. So, Komodor just gives you all of this information in a very informative way.Jason: I like the idea of pulling everything into one place, but I think that obviously begs the question: if we're pulling in this information we need to have good information to begin with. I'm interested in your thoughts of if someone were to use Komodor or just want to improve their visibility into troubleshooting Kubernetes, what are some tips or advice that you'd have for them in maybe how to set up their monitoring, or how to tag their changes, things like that? What does that look like?Itiel: I will say the first thing is using more metadata and tagging capabilities across the board. It can be on top of the monitors, the system, the services, like, you name it, you should do it. Once an alert is triggered, you don't necessarily have to go to the perfect playbook because it doesn't really exist. You should understand what's the relevant impact, what system it impacted, and who is the owner, and who should you wake up, like, now or who should look at it?So, spending the time tagging some of the alerts and resources in Kubernetes is super valuable. It's not that hard, but by doing so you just reduced the mental capacity needed in order to troubleshoot an issue. More than that, here in Komodor we read of this metadata label stacks, and we harness it for our own benefits. So, it is best practice to do so and Komodor also utilize this data.And for example, for an alert, say like, the relevant team name that is responsible, and for each service in Kubernetes write the team that owns this service. And this way you can basically understand what teams are responsible for what services or issues. So, this is the number one tip or trick. And the second one is just spend time on exposing these data. You can use Komodor I think, like, it's the best solution, but even if not, try to have those notification every time something change.Write those, like, web hooks to which one of your resources and let the team know that things change. If not, like, what we see in companies is something break, no one really know what changed, and in the end of the day they are forced to go into Slack and doing, like, here—someone changed something that might cause production break. And if so, please fix it. It's not a good place to be. If you see yourself asking questions over Slack, you have an issue with the system monitoring and observability.Jason: That's a great point because I feel like a lot of times we do that. And so you look back into your CI/CD logs, like, what pushes are made, what deploys are made. You're trying to parse out, like, which one was it? Especially in a high-velocity organization of multiple changes and which one actually did that breaking.Itiel: We see it across the board. There are so many changes, so many dependencies. Because microservice A talks with microservice B that speak with microservice C using SQS or something like that. And then things break and no one know what is really happening. Especially the developers, they have no idea what is happening. But most of the time also the DevOps themselves.Jason: I think that's a great point of, sort of, that shared confusion. As we've talked about DevOps and that breaking down of the walls between developers and operations, there was always this, “Well, you should work together,” and there is this notion now of we're working together but nobody knows what's going on.As we talk about this world of sharing, what are some of your advice as somebody who's helped both developers and operations? Aside from getting that shared visibility for troubleshooting, do you have any tips for collaborating better to understand as a team how things are functioning?Itiel: I have a couple of thoughts on this area. The first thing is you must have the alignment. Both the DevOps, or operation and the developers need to understand they are in this together. And this, like, base point in other organization you see they struggle. Like, the developers are like, yeah, I don't really need—like, it's the ops problem if production is down, and the ops are, like, angry at the devs and say they don't understand anything so they shouldn't be responsible for issues in production.So, first of all, let's create the alignment. The organization needs to understand that both the dev and the ops team need to take shared responsibility over the system and over the troubleshooting process. Once this very key pillar is out of the way, I will say that adding more and more tools and making sure that those tools can be shared between the ops and the dev team.Because a lot of the times we see tools that are designed for the DevOps, and a developer don't really understand what is happening here, what are those numbers, and basically how to use them. So, I think making sure the tools fit both personas is a very crucial thing. And the last thing is learning from past incidents. You are going to have other incidents, other issues. The question is, do you understand how we improve the next time this incident or a similar incident will happen? What processes and what tools are missing in the link between the DevOps and the system to optimize it. Because it's not after you snap your finger and everything works as expected.It is an iterative process and you must have, like, the state of mind of, okay, things are going to get better, or they are going to get better, and so on. So, I think this is the third, like, three most important things. One make sure you have that alignment, two, create tools that can be shared across different teams, and three, learn from past incidents and understand this is like a marathon. It's not a sprint.Jason: Those are excellent tips. So, for our listeners, if you would like a tool that can be shared between devs and DevOps or ops teams, and you're interested in Komodor—Itiel, tell us where folks can find more info about Komodor and learn more about how to troubleshoot Kubernetes.Itiel: So, you can find us on Twitter, but basically on komodor.com. Yeah, you can sign up for a free trial. The installation is, like, 10 seconds or something like that. It's basically Helm install, and it really works. We just finished, like, a very big round, so we are growing really fast and we have more and more customers. So, we'll be happy to hear your use case and to see how we can accommodate your needs.Jason: Awesome. Well, thanks for being on the show. It's been a pleasure to have you.Itiel: Thank you. Thank you. It was super fun being here.Jason: For links to all the information mentioned, visit our website at gremlin.com/podcast. If you liked this episode, subscribe to the Break Things on Purpose podcast on Spotify, Apple Podcasts, or your favorite podcast platform. Our theme song is called “Battle of Pogs” by Komiku, and it's available on loyaltyfreakmusic.com.
Hey everyone, we wanted to post an update on the future schedule of SQS and why we have been delayed. Learn more about your ad choices. Visit podcastchoices.com/adchoices
[קישור לקובץ mp3] שלום וברוכים הבאים לפודקאסט מספר 424 של רברס עם פלטפורמה - יצא מספר פלידנרומי, איזה מגניב! . . . - (אורי) 424?! . . . - (רן) 424 . . . התאריך היום הוא 17 באוקטובר, השעה היא 21:30 עוד מעט והשנה היא 2021 - (אורי) והטמפרטורות התחילו לרדת היום, גם היה גשם . . . - (רן) היה גשם היום, נכון, סוף סוף . . . היום אנחנו מתכבדים לארח את אילן ואת אור מחברת Melio - זה Mi-lio או Me-lio? . . .(אילן) האמת שזו שאלה מאוד טובה, כי כשהקמנו את החברה אז קראנו לה באמת Me-lio, אבל כשהתחלנו לדבר עם אנשים מארה”ב, אמרו לנו שיש משהו שנקרא Long e ו- Short e, שזה משהו שלא הכרנו . . . אז חלקם הוגים “Mi-lio” וחלקם הוגים “Me-lio” . . . מבחינתנו זה “Mi-lio”.(אורי) זה בטח ה-Domain שהיה פנוי . . . (אילן) האמת שה-Domain שהיה פנוי היה Mi-lio(paymnets.com), אבל ככל שהצלחנו לגדול והחלטנו שהשם זה ממש משהו שאנחנו שלמים איתו - כי היה גם שם תהליך, אבל זה סיפור לפודקאסט אחר - אז קנינו את melio.com, שהיה קצת יקר אבל הצלחנו להשיג, ארבע אותיות וגם com., אירוע קצת . . . (אורי) טוב, אז זה היה אילן . . .(רן) כן, אנחנו נעשה היום פודקאסט הפוך - נתחיל מהסוף . . . .כן - אז אנחנו שמחים ומתכבדים לארח פה את אילן ואור מחברת Melio - אנחנו נדבר על Melio ועל פלטפורמת התשלומים והטכנלוגיה שפיתחתם כדי באמת לממש את כל הסיפור הזה.אבל לפני זה - בואו נכיר אתכם: אילן - בבקשה:(אילן) תודה רבה שאתם מארחים אותנו - כבוד גדול.אני מכיר את אורי ורן עוד מלפני מספר שנים, כבוד הוא לנו לבוא לפודקאסט אני אילן, אחד ה-Co-Founders וה-CTO של Melio - הקמנו את החברה לפני כשלוש-וחצי שנים רשמית - קצת לפני עבדנו עוד בגראז', להבין מה אנחנו רוצים לעשותאני אספר על זה קצת עוד מעטשנים יחסית אינטנסטיביות בשנים האחרונותלפני כן הייתי ה-VP Engineering בחברה בשם Winward, ולפני זה עבדתי ב-Outbrain כשנתיים + . . .איתי פה נמצא אור . . .(רן) אור - ברוך הבא!(אור) תודה רבה - אני היום ב-Melio ה-Principal Engineer, הצטרפתי יחסית ממש בהתחלה. זה היה . . .לפני כן הייתי Co-Founder בסטארטאפ אחר, ולפני כן הייתי יועץ באיזושהי חברת נקרא לזה “בוטיק-DevOps” קטן שנקרא FewBytes ואחרי שעזבתי את ה-Startup שלי בתור Co-Founder - הייתי Co-Founder ממש לא טוב - מישהו שידך ביני לבין אילן והאמת שממש בשיחות הראשונות עם הפאונדרים של Melio זה פשוט . . . אני יכול להגיד באופן אישי שזה היה מעיין “אהבה ממבט ראשון”, ממש “עפתי עליהם” עד הסוף ואמרתי “אני רוצה לעבוד פה” וכל השאר פחות או יותר היסטוריה . . .(רן) אז Melio - אני מניח שיש כאלה שכבר שמעו את השם, אבל למי שעוד לא שמע: מה עושה Melio?(אילן) אנחנו פיתחנו ומפתחים פלטפורמה לעסקים קטנים, להעברות תשלומים.ככל שזה יהיה אולי מופלא ואולי לא לחלק מהמאזינים או למי שמקשיב, תשלומים, בארה”ב בעיקר, עדיין רובם ככולם מועברים על גבי פיסות נייר - שהם שיקים . . .סדר גודל של 18 טריליון דולר נעים בארה”ב בין עסקים קטנים בכל שנהסדר גודל של כחמישה מיליארד שיקים נכתבים בין עסקיםכשאנחנו . . . זה סדר גודל שראינו לפני ארבע שנים, ואמרנו “רגע - זה לא הגיוני”בעולם שה-Digital Payments קורים בין חברים, כלומר - היום להעביר כסף בין Friends & Family קורה בצורה מאוד פשוטה, יש הרבה מאוד אפליקציות שאתה יכול באמצעותן להעביר כסף בצורה סופר-קלה.אם אתה עכשיו בתור Consumer שרוצה לעשות Check-out ב-Online, התהליך הוא מאוד מאוד מתקדם, כל עולם ה-eCommerce.אתה יכול לעשות Check Out עם Stripe או עם כרטיסי אשראי או Affirm או עם Klarna או עם כל שיטת Check out אחרת.עדיין, תשלומים לספקים, רובם ככולם, מועברים בעצם על פני פיסות נייר - שיקים, העברות בנקאיות - דרך כלים שהם מחוץ . . . בעצם כלי מערכת, שהם בעיקר כלים של הבנקים.(רן) אפילו בקרנות הון סיכון אומרים “I'll write you a check” . . . (אילן) I'll write you a check”, Yes” . . . וגם אנחנו היום, כ-Melio - אנחנו כותבים שיקים . . .בעצם, יש לנו ספקים שרוצים לקבל רק שיקים . . .והבעיה הזו נראתה לנו די מעניינת ומאוד מאתגרת - אמרנו “איך זה יכול להיות, בעולם ש-Payments עוברים ו-Shifting ל-Digital בצורה מאוד מאסיבית, עדיין עולם ה-Supplier Payments נמצא על גבי פיסות נייר” . . .(רן) על אילו סוגי עסקים אנחנו מדברים? מספרות, וטרינרים, . . . ?(אילן) אז אנחנו מדברים כמעט על כל סוגי העסקים - זה יכול להיות כמו שאמרת - מספרות וטרינרים, מסעדות, Doctor Offices למינהם, Professional Services, צלמים . . . (אורי) גולדמן-סאקס?(אילן) גולדמן-סאקס . . . גם כאלה, הגדולים . . . ובאמת לחברות כמו Nike או Fortune-500's יש כלים, היום, לעשות גם Procurement וגם Paymentsאבל כשאתה הולך לעסק הקטן - מה שנקרא Owner-Operated Business - לבעל העסק כיום אין כלי מתאים כדי לנהל את תשלומי הספקים שלו.ומה שמאפיין בעצם את אותם עסקים זה שאין להם היום איזה Bookkeeper או איזשהו Accounts-Payables Expert שעושה עבורם את ה-Paymentsאנחנו היום ב-Melio, או אצלכם ב-Outbrain - יש בעצם Finance Department, שמתעסקים ב-Accounts-Payablesאבל אם אני עסק קטן, אם אני עכשיו בעל מסעדה ויש לי חמישה-עשרה עובדים - בדרך כלל מי שמטפלים בזה זה או אני או מישהו שהוא Trusted Employee.והיום עסק קטן ממוצע - העסקים שאנחנו מטרגטים (Targeting), של 5-10 עובדים, סדר גודל של 1-2 מיליון דולר Revenue בשנה - מוציאים סדר גודל של 50-60 Payments בחודש.וה”אירוע” הזה הוא בדרך כלל Heavy . . . בדרך כלל נעשה ידנית . . .(רן) אנחנו תיכף נצלול לסיפור הטכנולוגי שם, אבל קצת בכל אופן כדי להבין את הרקע - פתאום קם אדם בבוקר ומרגיש שהוא חייב לעשות מערכת Payments? זאת אומרת - איך קורה שילד-טוב-ירושלים, אילן, אחד מהפאונדרים של החברה, מחליט שבא לו להרים מערכת Payments לעסקים בארה”ב?(אילן) אז מה שבעיקר משך אותנו זה גודל ההזדמנות - באנו ואמרנו רגע, עסקים קטנים - סליחה על הקלישאה אבל זה The Backbone of the economy . . . בסופו של יום, הדרך שהם מתנהלים - גם ברמת האופרציה של להוציא את התשלומים וגם האופרציה גם גוררת . . . אופרציה לא יעילה גורמת לניהול תזרימים מאוד לא טוב עבור העסק.עסקים קטנים - אם אתה מסתכל על הסיבות שעסקים נסגרים לרוב - אז חלק נותנים שירות לא טוב או מוצר לא טוב, אבל בהרבה מאוד פעמים זה נובע מכך שהם לא יודעים לנהל נכון את “האירוע התזרימי”, אתה-Cash Flow.והרבה פעמים זה קורה בגלל היעדר יכולת אופרטיבית והבנה של מה בעצם צריך להוציא היום ומה אפשר להוציא מחר ומה אפשר לנהל בצורה יותר חכמה.באמת, מה שהדליק אותנו, מה שבעצם גרם לנו להגיד זה איך אנחנו יכולים לעזור לעסקים קטנים? - על ידי זה שנוכל בעצם לקחת את עולם ה-Payments שלהם לעולם ה-Digital, ולנהל בעיקר את ה-Cash Flow.(רן) אוקיי, אז אני לא מבין גדול בעולם ה-Finance, אבל אני כן יודע שיש כמה חברות וכמה ספקי תשלומים גדולים - הזכרתם אני חושב את Stripe ויש עוד כל מיני גדולים אחרים . . . (אורי) . . . האם בין ה-CRM לניהול הכספי - CRM זה יותר לצד הלקוחות . . . (רן) . . . כן, נשים לרגע את הסיפור העסקי בצד - אני מניח שיש סיבה למה Stripe לא מתאים להם, אבל אתם גם החלטתם לייצר מערכת תשלומים פנימית, זאת אומרת - לנהל את הכל אצלכם. למה לעשות את זה ולמה לא להשתמש באיזשהו צד שלישי - איזשהו בנק, ב-Stripe או כל דבר אחר כזה?(אילן) לפני שאני אענה על השאלה הזאת, אני אקח לרגע צעד אחורה - הסיבה בעצם כיום לכך שעסקים בעיקר מתנהלים - לתשלומי ספקים - בעיקר עם שיקים, זה בגלל חוסר ההסכמה, לרוב, הבסיסי בין איך שצד אחד רוצה לשלם לאיך שהצד השני בעצם רוצה לקבל את הכסף.היום, כשאני הולך ועושה Check out online, ויש שם איזשהו Check out עם Stripe - אז אני יכול לשלם בכרטיס אשראי, והצד השני יקבל את זה לחשבון הבנק שלו, בעצם.יש איזשהו “נדל”ן”, שזה ה-Point-of-Sale, שיכול לסלוק את כרטיס האשראי שלי - והצד השני יקבל את הכסף.בעולם ה-B2B, לרוב הטרנזקציות (Transactions), החלק הארי של הטרנזקציות קורה OTC - Over the Counter.אין בעצם היום איזשהו Point of Sale - לא לרכישה ולא לתשלום - וה-Point of Sale שבעצם קיים זה ה-Invoice.כשאני מזמין, לדוגמא, מהספק דגים שלי עשרה ק”ג סלמון למסעדה - יחד עם הדגים אני מקבל בעצם Invoice, ושם אני אמור לשלם את התשלום עבור הדגים באיזה Net Terms.עכשיו - אני ספק דגים שכבר קיים בשוק 20 שנה, ואני עכשיו מקבל ממאות לקוחות כסף - ובאיזשהו מקום אני לא בהכרח רוצה לתמוך בעוד שיטת תשלוםכי כל תהליך ה-Finance שבניתי או כל תהליך ה-Reconciliation שבניתי בעצם בנוי מעל שיקים, שמגיעים אליאני יודע איך הכסף מגיע ואיך לקשור אותו לחשבונית המתאימה.אבל אותה מסעדה שנפתחה עכשיו, מסעדה חדשה שלא בהכרח רוצה לשלם בשיקים - רוצה לשלם בכרטיס אשראי, רוצה לשלם בהעברה בנקאית . . . .שני הצדדים לא מסכימים על אמצעי התשלום.(אורי) . . . ואז יורדים למכנה המשותף הכי נמוך - שזה השיקים . . .(אילן) בול . . . ולכן מגיעים בדיוק למכנה המשותף הנמוך ביותר שזה השיקים - שיק - ברור שהוא מתקבל בכל מקום, ברור שהוא “ג'וקר”, ואתה יכול בעצם לתת אותו, ובעצם זה סוג של סטנדרט . . . (אורי) זה נייר - אפשר לעטוף איתו דגים . . .(רן) יש יותר נמוך - יש Cash . . . אבל לשם עוד לא ירדנו . . . יש מטבעות זהב . . .(אורי) נייר . . .(רן) אז בעצם אתם החלטתם שאתם בונים איזשהו Transpiler - משהו שמתרגם דיגיטלי לנייר, נייר לדיגיטלי או כל מיני תרגומים אחרים שקיימים . . . (אילן) בדיוק, ולשאלתך של למה בעצם בנינו Payment Infrastructure - כדי להגיע למצב שאנחנו בעצם נוכל לבוא ולשרת את אותם עסקים, הרי היינו צריכים לייצר איזושהי “חוסר תלות” בין הצדדים - Decupling בין המשלם למקבלובנינו בעצם Payments Infrastructure חדש, היום כבר מעל שלושה בנקים - Evolve Bank & Trust ו-Silicon Valley Bank ו-JPMorgan Chaseבעצם בנינו יכולת לבוא ולסלוק כסף מהמשלם בכל דרך שנרצה - זה יכול להיות כרטיס אשראי, זה יכול להיות Debit Card, זה יכול להיות בנק, זה יכול להיות PayPal, זה יכול להיות Apple Pay . . . אנחנו יכולים לסלוק כסף בכל דרך אפשרית - ולהוציא אותו מהצד השני בכל דרך שהצד השני יחפוץ בהבעצם יצרנו ניתוק בין שני הצדדים - מה שנותן לנו היום המון כוח לבוא לעסק - לבוא למסעדה או לאיזשהו צלם או מספרה או כל מקום אחר - ולהגיד “אוקיי, לא משנה עכשיו, אתה לא צריך לשכנע את הצד השני איך לקבל את הכסף, תן להם באיזו דרך שהם יחפצו, ואתה תשלם איך שאתה רוצה”.(אורי) יש גם, כאילו את “הדרך של Melio”, את ה . . . לא יודע, כרטיס או סוג של bit כזה . . . אפליקציה שהיא אפליקציית-סליקה, שאם היא מתאימה לשני הצדדים אז מה טוב, אבל אתה יכול גם דרכה לקבל ול . . .?(אילן) אז הדרך שאנחנו היום . . . בדוגמא שנתתי, נניח שאני מסעדה, אז אני יכול לסלוק, אני יכול עכשיו לבחור לשלם באשראי, יכול לבחור לשלם בבנק - ב-Bank Transfer - ואתה תקבל איך שתחפוץ, נניח שיקים או העברה בנקאית או כל דרך אחרת.אנחנו כן מייצרים . . . אנחנו נייצר בעצם סוג של . . . אם אני מבין נכון את השאלה שלך, מעיין Wallet, כך שאפשר בעצם, ברגע ששני הצדדים ב-Network, אז בעצם נוכל להעביר כסף - שהוא בעצם Wallet, שכל אחד יוכל להשתמש בו ב-Network עצמו.עכשיו, אחד הדברים הנוספים שיצרנו ב-Payments Infrastructure הזה זה בעצם, שלהבדיל ממערכות כמו bit או Pepper, או בארה”ב Venmo או PayPal, ששני הצדדים צריכים להיות ב-Network על מנת שצד אחד יוכל לשלם לצד השני - אנחנו בעצם יצרנו יכולת של מה שאנחנו קוראים לו Open Network - רק צד אחד צריך להיות ברשת על מנת שהצד השני יקבל את הכסף.על ידי כך, בעצם הורדנו את העומס ממי שכרגע משתמש בנו, כדי לשכנע שהצד השני יכנס.(רן) כן, אז החלטתם והבנתם שאתם רוצים להציע מערכת תשלומים נורא גמישה שהיא Open ואתה יכול לשלם איך שאתה רוצה ואתה יכול לקבל את הכסף איך שאתה רוצה - אתה בא לאור, “המתכנת המסכן”, אומר לו: “אור, בוא תבנה לי כזה!” . . . איך מתחילים? מה האתגרים פה? איך בכלל מתחילים לבנות מערכת Payments כזו מאפס?(אורי) אז אור מוציא לו חשבונית . . . (אור) אז באמת, Melio זה קצת יותר ממערכת תשלומים, מן הסתם - חלק גדול מאוד מהמערכת מבוסס על Interface ממש נוח - בגלל שזה Small Business, בגלל שאין להם כל הרבה זמן להתעסק עכשיו עם איזושהי מערכת Business-ית מורכבת, שבדרך כלל פונה לעסקים, אז בגדול, מה שאני הצעתי לאילן כשהתחלנו היה שאמרתי “אילן, תשמע - אנחנו נמצאים עכשיו On the verge of Serverless”, יש לנו הזדמנות לא לתחזק שרתים! יש לנו הזדמנות להינות מהיתרונות . . . “(רן) . . . ואומר את זה אחד שתחזק כבר הרבה שרתים, אמרת ב-Intro . . . (אור) בדיוק - מה שאצלי בראש היה זה שאני לא רוצה להגדיר יותר NTP בחיים, לעולם . . . אז אמרתי לאילן “בוא נעשה Serverless! בוא נלך על זה ובוא נראה אם זה עובד לנו”.[משלנו!]ואילן זרם איתי . . . עשה לי בהתחלה פרצוף של “אתה חושב? אתה בטוח?”, אבל אמרנו “יאללה, בוא נלך על זה” . . .(רן) “זה לא Hype, זה לא כמו GraphQL שיעבור עוד מעט? . . . .”(אור) בדיוק . . . באמת, היו לו ספקות קצת בהתחלה, ואמרתי לו “שמע - עלי! מה שלא יעבוד, אנחנו נסדר”.ואז באמת בנינו את המערכת - ה-Payments Processing שלנו בעצם רץ Serverless.האמת שחלק גדול מאוד מתשתית של Melio רץ רק על Serverless, רק על Lambda, ספציפית על Lambda.(רן) והמוטיבציה היא באמת “אני לא רוצה את כאב הראש הזה של NTP”, או שיש גם סיבות ארכיטקטוניות אחרות?(אור) זה מאוד . . . .(אורי) . . . זה מאוד Stream-oriented, נכון? זה Processing של Streams של Data, וזה נשמע מתאים . . . (אילן) אז זו נקודה מאוד חשובה, מה שאמרת עכשיו - בסופו של יום, תשלומים - רובם יוצאים או בהעברות בנקאיות מצד אחד, או בשיקים . . . עדיין אנחנו מוציאים שיקים - Melio מוציאה היום סדר גודל של מאות אלפי שיקים כל חודש, כי עדיין הספקים רוצים לקבל שיקים . . .תהליך גביית התשלום הוא באמת Stream-oriented, כלומר - אני יכול להיכנס למערכת ולקבוע תשלום.אני יכול לקבוע אותו לעכשיו, אני יוכל לקבוע אותו להיום או למחר לעוד חודש - אבל בסופו של יום, כל או רוב התשלומים מתמקדים בעצם בנקודת זמן אחת.בסופו של יום, כדי להעביר כסף בהעברה בנקאית או בשיק - זה דווקא Batch-oriented, כלומר הכל מתרכז בנקודה אחת, כי הבנקים בסוף עובדים ב-Cut-off-ים . . .זאת אומרת שכשאני רוצה להעביר כסף מנקודה A ל-B, בעצם יש Cut-off של הבנקה-Cut-off של הבנק הוא ב-2300 או 2400 Central Time בארה”ב, ואז בעצם בנקודה הזאת אנחנו לוקחים את כל השלבים שנקבעו להיום, או שנקבעו למועד שאנחנו רוצים - ובעצם מוציאים אותם.מה שאומר שהמערכת מקבלת Event-ים, מקבלת פקודות, ב-Stream - אבל בסופו של יום, היא מתנקזת לנקודה אחת, שבה צריכים להעלות את אותו קובץ, אותו Ledger, לבנק, כדי לבצע את התשלומים השונים - או להוציא שיקים או . . .(אורי) הבנתי שאופי הטרנזקציות האלה זה אופי שלא מצריך State, כמעט . . . (אור) נכון - אז באמת, אני מוכרח להודות שבהתחלה המוטיבציה הייתה מאוד “לנהל כמה שפחות” ולאט לאט, עם הזמן - האמת שדי מהר - ראינו שזה משחק לטובתינו בעוד מקרים, כי יש לנו את הצד . . .צריך להבין שהשוק הזה הוא נורא נוח, כי . . . במובנים מסויימים הוא מאוד נוח ונקרא לזה “פריוויבלגי” לנו, כ-Business - כי מדובר בעסקים, אז הם עובדים 0900-1700, זה רוב העומס שיש לנו במערכתהם לא עובדים בשבת, הם לא עובדים בראשוןהבנקים לא עובדים בשבת ולא עובדים בראשון - אז אנחנו לא עושים Processing בימים האלו.יש לנו פריווילגיה מאוד גדולה להפעיל את המערכת רק בזמנים מסויימים,וגם בתוך אותם ימים - רק בשעות מסויימות(רן) אילו זה רק היה בשעון ישראל אז זה היה אידיאלי . . . (אור) כן, זה היה מושלם . . . אז במובן הזה, Serverless מאוד עזר לנו, כי אם ניקח לרגע רק את ה-Payments Processing -אז 90% מהיום זה 0, לא קורה שום דבר . . .אולי יש כל מיני Management ו-Logistic tasks וכאלו שרצים ברקע, אבל חוץ מזה - כלום.ואז, ב-Trigger מסויים ביום, במערכת מתחילה לעבוד, עושה את כל ה-Processing שהיא צריכה לעשות - וחוזרת לישון.(אורי) זה מזכיר לי קצת ב”רמזור” כשהוא מלמד ריקוד במשרד רואה חשבון - “מה קורה כל החודש? כלום-כלום-כלום . . . 15 לחודש?! אוו . . . .”.(רן) אז אתה אומר שהיכולת היפה של פונקציות Lambda לעשות Scale-up באופן מיייד ואחר כך לכבות לכמעט אפס - זה יתרון ארכיטקטוני אחד . . . דרך אגב, לגבי ה-State שהזכרתם פה, אז לפחות בדרך שבה אני מדמיין, דווקא ב-Payments אני מדמיין שקיים הרבה מאוד State, רק שהוא תמיד צריך להיות Persistent, את אומרת - הוא אף פעם לא In-Memory, כי אסור לאבד אותו . . . אז אולי זה לא נכון להגיד ש”לא קיים State”, אבל ה-State תמיד חייב להיות Persistent . . . .(אור) נכון - ה-State, במקרה שלנו, בוא נגיד . . . . אנחנו לא “Serverless קלאסי”, נקרא לזהה-State שלנו יושב על Database טרנזקציוני (Transactional), הטרנזקציות שלנו הן בתוך ה-Lambda, מן הסתם גם חלק מה-Processing של מה שה-Lambda עושהחלק ממה שאנחנו עושים בעבודה מול הבנקים זה בעצם חלק מהטרנזקציה שקוראת מול ה-Database, זאת אומרת - אנחנו מתייחסים ל-Lambda כאל Volatile לחלוטין - שאם היא תיפול, לא יקרה כלום מבחינת “לא יזוז כסף לשום מקום”.וה-State עצמו באמת נשאר ב-Database.(רן) איך נראים ה-API-ים מול אותם בנקים? אני זוכר מהפעם האחרונה שעשיתי איזשהו Payment, זה היה איזשהו CORBA זוועתי עם Perl וכאלה דברים . . . .מה המצב היום?(אור) אז באמת, תשלומים מול בנקים זה סיפור שלם לגמרי, שאפשר לספר עליו . . . אני אתן לרגע את הראשי תיבות ACH - זה Automated Clearing House, שזה בעצם אוטומציה למשהו ענתיקה שנקרא Clearing House . . .(רן) . . . ושום דבר שם לא אוטומטי . . . .(אור) . . . והאוטומציה . . . אני אתן שתי אנקדוטות, אבל בגדול זה קובץ עם המון Records בפניםאתה שם אותו באיזשהו Server בצד השני של העולם - וזה “חור שחור” . . . .אין שום דבר - לא מודל של Request Response . . . יש Response מסויים, אבל זה לא בדיוק אומר לך “אה, כן - אנחנו בדיוק העברנו את הכל!” - אתה יודע רק אחרי כמה ימים אילו מה-Records נכשלו.מה שלא נכשל - הצליח . . . זה בערך המודל לפרוטוקול של הדבר הזה.(רן) כנראה . . . (אור) “כנראה” . . . בדיוק.עכשיו, תוך כדי שאנחנו עובדים אתה אומר לעצמך אוקיי, זה מודל ש . . . יש שם איזשהו מחשב שעובר על הרשומות אחת-אחת, ה-Processing מעביר אותן הלאה ומחזיר אלינו מה שנכשל ומה שלא נכשל.ואז גילינו שאחת הטרנזקציות שחזרה ונכשלה - אנחנו ראינו איזשהו Meta-data בפנים שאנחנו שומרים כדי למפות את זה אחר כך לטרנזקציות פנימיות שלנו וכו' - ואצלנו זה התחיל נגיד עם “t” קטנה ומספר מאוד ארוךוחזרה אלינו טרזקציה שאנחנו לא מזהים - זיהינו אותה, כי שהסתכלנו בעין זה היה “T” גדולה ומספר מאוד ארוך . . . .ואז הבנו שאיפשהו ב-Chain של הבנקים, יש פשוט איזשהו בנאדם שפשוט הקליד “T” . . . . המחשב לא טועה בין “t” ל-”T”, זה שני דברים שונים לגמרי, אבל בנאדם שמקליד T באיזשהו אקסל או email או משהו - כנראה התחלף לו פעם אחת ל-T גדולה ומשם זה נשאר גדול וחזר אלינו בחזרה עם אות גדולה . . .(רן) זה היום שנשפך לו הקפה על ה-Shift . . . .(אור) משהו כזה, בדיוק . . . אז המערכת הזאת היא כאילו סמי-אוטומטית, כי הדברים הם Triggered בצורה אוטומטית - אבל יש שם הרבה מאוד עבודה אנושיתוגם השגיאות שחוזרות הרבה פעמים זו עבודה אנושית, כל מיני מיפויים שמסתכלים על ה-Owner של החשבון בנק - הרבה פעמים זה שם . . . .הם אשכרה ממפים את זה לשם, והרבה פעמים הם לא מוצאים את המספר . . . יש ממש הרבה מאוד תהליכים, ואני רוצה להגיד אולי - אם המערכת היא סוג של . . . ה-Input-ים שהיא מקבלת מהמכונה - אנחנו מתייחסים אליהם גם כאל Input-ים אנושיים, כדי לוודא שבאמת לא נפלנו גם במקרה הזה.(אורי) אבל רגע - קודם, אילן דיבר על זה שאתם מוציאים המון שיקים. זה כאילו . . . אשכרה יש מדפסות שמדפיסות נייר? Serverless מפה ועד להודעה חדשה, אבל מדפסות . . .(אילן) חבל על הזמן . . . (רן) בטח יש שירות של אמאזון שמדפיס שיקים, לא? . . . .(אילן) א - נכון, יש שירות של Amazon שמדפיס שיקים [?], אבל אנחנו משתמשים בשירותים של הבנקים שמדפיסים שיקיםאור דיבר בעצם על קובץ ACH, שזה קובץ מקודד, ששולחים אותו כדי לבצע העברות בנקאיותיש קובץ עם פורמט אחר, קצת יותר מתקדם, ב-JSON, שמעלים לבנקים והם מוציאים שיקים.בעצם, אנחנו נותנים פקודה לבנק - אתה צריך להעביר את זה עד שעה מסויימת, את הקובץ עצמוכשאתה אומר להם “הנה הפרטים” - ומהצד השני יש מדפסות, ומוציאים בעצם שיקים . . . עכשיו, הם עוברים, נכנסים למעטפות, עוברים ל-USPS - ומגיעים ליעד שלהם . . .(אורי) אני חייב להגיד שכאילו . . . נגיד ב-Outbrain, כשאנחנו עושים תשלומים לספקים - וזה הרבה מאוד ספקים, פעמיים בחודש - אנחנו עובדים עם איזושהי מערכת נוראית שנקראית מס”ב, מכירים? (אילן) [מהנהן ביאוש כנראה](רן) ישראלית?(אורי) כן, “מרכז סליקה בנקאי” או משהו כזה . . . (אור) זה די מזכיר את המבנה של ה-ACH, באיזשהו מקום - מאנקדוטות ששמעתי . . . (אורי) אני, אישית, מעדיף לחזור לשיקים, אחרי החווייה עם המס”ב הזה . . . כאילו, מעלים שם איזשהו קובץ Excel, זה אותו דבר כנראה . . . נורא.(רן) יותר בטוח מלשלם ביטוח לעובדים, שגם זה בדרך כלל לא מגיע, אבל לא משנה . . . .(אורי) נכון . . . .אבל יש שם גם . . . לפעמים מתחלפות להם . . . השמות מתחלפים בצדדים כי הכל בעברית, ואתה צריך לקרוא ביוונית, וזה . . .(רן) אז היום אתה מומחה ל-Payments, אור? את היום והלילה שלך אתה מבלה בפיענוח של קבצים כאלה?(אור) אז אני, בוא נגיד במרכאות “למזלי”, יש צוות הרבה יותר גדול שמתעסק בזה.אני עשיתי את זה תקופה יחסית ארוכה, אני . . . זה כמו במטריקס, שהוא רואה את הקוד ויודע מה מופיע מאחורי זה בלי להסתכל על התמונות? אז זה אותו דבר - אני מסתכל על הקובץ ACH ואני יודע - זה המספר של הזה, המבנה הזה זה שם . . .(רן) זה “T” גדולה אז היום Rachel עבדה, זה “t” אז . . .(אור) כן . . . גם בשיקים, אגב, זה מאוד . . . שוב, שיקים זה תהליך אנושי - זה נשלח בדואר אז זה הולך לאיבודיש גם דברים . . . לדוגמא, כשהתחלנו שלחנו מעטפות בצבע הלא נכון . . . שלחנו מעטפות סגולות של שיקים, של Melio, סגול . . . וגילינו שיש אנשים שפשוט מניחים את השיק בצד ולא עושים איתו כלום, כי הם חושבים שזה פרסומות . . . אז שינינו את זה ללבן - ופתאום אנשים כן הפקידו את השיקים . . . .יש כל מיני דברים . . . זה באמת, הערבוב הזה של תהליך אנושי ותהליך שאנחנו מייצרים דברים אוטומטית, שמים ב-API באיזשהו מקום איזו JSON או לא JSON - אנשים בצד השני בסוף צריכים לעשות פעולה, וזה הופך את הכל להרבה יותר מורכב.(אורי) יש לי משהו שמעניין אותי - דיברת על זה שבאים ועושים תשלום, ומקבלים מצד אחד ומשלמים מצד שני - ואתה רגיל שהטרנזקציה נסגרת, נכון?עכשיו, “נסגרת” זה אומר “הכסף עבר”, אני יודע, אבל זה לא בדיוק ככה - אתה . . . הכסף לא עבר, אתה רק העברת את הקובץ ל-Processing של מישהו אחר או שהשיק בדואר, זה . . . ואין היזון-חוזר.(אילן) אין היזון חוזר, זה נכון, וגם במקרים מסויימים, כמו שאור אמר - ב-Bank Transfer, ב-ACH, הפרוטוקול עובד בזה שהוא אומר “כל עוד לא חזרתי אליך אז הכל בסדר”, ואם חזרתי אליך עם שגיאה אז הנה הדברים שנכשלו”אבל כן בנינו מערכת - בנינו מערכת, בסופו של דבר אנחנו מעבירים היום בקצבים של עשרות מיליארדים של דולרים בשנה, יש לנו עשרות אלפי לקוחות ואנחנו חייבים שהכל יהיה מאוזן.ה-SLA הוא מאוד מאוד חשוב - בסופו של יום, אנחנו חייבים . . . לא יכולים להפסיד שדולר לא יגיע לצד אחד או תשלום או שניים יפלו, כי בסופו של דבר מדובר על עסקים שהכסף שלהם לא הגיע לספקים, וזה המון המון Relations שבין העסק לספק.ולכן בנינו מערכות שיושבות בעצם מחוץ ל-Payment Processing, שבעצם בודקות שהספרים מאוזניםנכנסות לבנקים, לוקחות קבצים שאנחנו . . . שמחוץ ל-Transactions, שהם קבצים שמגיעים אלינו - כדי לאזן את הספריםכדי לראות בעצם שכל מה שאנחנו ייצאנו - אנחנו אחרי זה מתשאלים את הבנק, אז אנחנו מבינים . . .כל מה שאנחנו שלחנו לבנק כהוראה, כשאנחנו אחרי זה מתשאלים את הבנק, אנחנו מוודאים שהבנק באמת הוציא את זה.כל המערכות של ה-FinOps שאנחנו . . . .(רן) אני מניח אגב, שזה ערך מוסף משמעותי, מעבר ליכולת הטכנית להעביר תשלום - לוודא שזה מאוזן, לוודא שהדברים עברו, אני מנחש שזה ערך מוסף . . . אני יכול להגיד, שוב - אם נחזור לאנקדוטה של החברת ביטוח - אני זוכר פגישה עם סוכן ביטוח שהבטיח לי ש”פה יש מחשב שבודק!”, אז שאלתי אותו “מה, לפעמים אין מחשב שבודק?”, אז הוא אמר לי “לא . . . בחברות זה אנשים, אצלי זה מחשב!”. אז ברוך הבא למאה העשרים . . . (אורי) אבל אתה אומר “אני מבצע את הטרנזקציות, ואחרי זה יש לי Sweeper כזה שעובר ובודק שבאמת כל הטרנזקציות - "באמת הבנק שילם את זה”, זה מה שתכל'ס סוגר את הטרנזקציה.(אילן) זה יוצא אצלנו כדוח במערכתאנחנו עושים בדיקות אצלנו, עוד במהלך העלאת הקבצים - גם שם יכולות להיות נפילות שלנויש הרבה Lambda-ות שרצות, יש הרבה קבצים, אנחנו עושים תהליך של MapReduce, שעוברים בעצם שורה-שורה בקבצים ופותחים אותם ב-Lambda-ות שיש לנובסופו של דבר אנחנו צריכים להבין שכל מה שקראנו מה-Database עולה לתוך הבנקים - עוד לפני בכלל שיכולים לסגור את הטרנזקציה.אז גם שם פיתחנו יכולת שבאנו ואמרנו שאנחנו לא מחכים - בגלל שאין שגיאות ואין היזון חוזר . . .(אורי) זה לא שאין שגיאות - אין הודעות שגיאה . . . (אילן) אין הודעות שגיאה, בדיוק - אז אנחנו, בתהליך העלאת הקבצים, אנחנו כל הזמן בודקים מה העלינו לעומת מה שהיה כתוב ב-Database - כי התהליך בעצם חיצוני ונפרד - כדי להקפיד שהדברים מאוזנים.רק בשלב שלאחר מכן, יש תהליך שבעצם מתשאל את הבנקים ובודק מה בעצם אנחנו העלינו, ואז רואה שהכל מאוזן.(אורי) עד כדי “t” קטנה ו-”T” גדולה . . . .(אילן) . . .שרק אור תופס, כן . . . (אור) יש פה באמת . . .אפשר להגיד שאנחנו עושים Reconciliation בכמה רמות שונות, מכמה Check-Points שונים בתוך התהליךגם מיד אחרי שאנחנו מעבירים את הכסף, גם כמה ימים אחר כך, גם כשמהבנק מודיעים לנו, בדיעבד, מה הצליח ומה לא הצליח, גם אחר כך במאזן של של הבנק, הסופי, שאנחנו רואים . . . .אנחנו מנסים באמת לקבל את התמונה השלמה, כי שוב, כמו שאילן אמר - אנחנו לא יכולים להרשות שבגללנו ה-Customer שלנו לא ישלם חשבון אחר, כי אז הוא, שוב, בבעיה מול הספק שלויש לו עכשיו Cash-flow problems . . . בשבילו זה 100% - תשלום אחד בשבילו . . .אצלנו תשלום אחד זה פרומיל-של-הפרומיל - אצלו זה 100% מהדברים שהוא מתעסק בהם.(אורי) זה גם פוגע לו לפעמים בדירוגי אשראי או כאילו . . . Credit Score.(אור) יכול . . .(אילן) זה יחס עם הספק . . . זה יחס עם הספק, שהוא אומר לו “The Cheque is on its way” - והוא לא באמת on its way, ואז היחסים ביניהם עשויים להיפגע.(רן) איך עוד נראה הסיפור הטכנולוגי? זאת אומרת - האם עצם זה שאתם עוסקים ב-Domain הזה, של פיננסים, יש לזה השלכות טכנולוגיות, לצורך העניין - באילו שפות אתם כותבים? איזה Security זה אומר מבחינתכם? השלכות אחרות, טכנולוגיות שקיימות?(אור) מבחינת שפות, אנחנו די “סטנדרטיים”, נקרא לזה ככה, לפחות בתעשייה היום.אנחנו עובדים ב-JavaScript, גם קצת Python בכל מיני מקומות בתוך המערכת - אבל בגדול רוב המערכת כתובה ב-Node.זה מאפשר לנו, פשוט בגלל ש-Lambda ו-Node זה מאוד . . . נקרא לזה “Native” ב- Runtime.לא ניסינו יותר מדי להתחכם שם - אנחנו בודקים את עצמנו כמה שיותר.מבחינת Security, גם - Lambda משחק יחסית לידיים שלנו במקרה הזה: אין Server . . . אין Port לפרוץ אליו אפילוזה לא קיים, כקונספט . . . גם ל-Compliance, אגב - גם מאוד עזר לנו, כל מה שקשור ל-Serverless.כשעברנו Compliance - עברנו כבר שני תהליכים - ופשוט, יש חברות . . .(אורי) מי הגוף שמבקש מכם את ה-Compliance? זה Compliance עם מי?(אור) Compliance ISO 27001 . . . (אורי) שהוא יותר פיננסי או . . .(אור) זה של אירופה יותר, אם אני לא טועה . . . (אילן) האירופאי זה בעיקר Security, ועכשיו אנחנו בעצם בתהליך, מסיימים אותו, של SOC 2 Type 2מי שדורש מאיתנו את הרגולציות האלה זה (א) השותפים שלנו, זה הבנקים שאיתם אנחנו עובדים - זה ה-Rails שאיתם אנחנו מעבירים את הכסף ושותפים - Melio בסוף . . . עוד לא נגענו בזה, אבל נחזור רגע לחלק הטכנולוגי - ל-Melio יש שני קווי מוצר עיקריים - הקו הראשון זה Stand-alone Experienceהקו השני בעצם זה ה-Platform - “היכולת לאמבד (To Embed) את ה-Experience בנדל”ן של מישהו אחר”השותף הכי גדול שלנו היום זה Intuit, ב-QuickBooks - בעצם שמו את היכולות שלנו בתוך QuickBooksושותף שמקבל שירות פיננסי רוצה לדעת שאנחנו Well-Secured.(רן) אז אמרת . . . למה אתה מוריד את ה-Attack-Surface? . . .(אור) דילגנו על זה . . . גם בתוך ה-Compliance יש סעיפים שלמים של Port management וכל מיני דברים כאלו, ברמת המכונות וה-Server-ים, שזה פשוט לדלג עליהם . . . לקצר מאוד את הזמן של ה-Compliance, באופן מפתיע . . . זה מפתיע את הצד השני, שעושה לנו את ה-Review - כמה חתכנו.זה היה מאוד נוח בהקשר הזה.(רן) למרות שאתה יודע - אני מניח שה-Compliance הזה יזוז עם הזמן ויתרגל, ויגלה שגם לצורך העניין, ב-Serverless צריך פשוט לבדוק דברים אחרים . . . אין יותר Port-ים פתוחים, אוקיי . . . אין יותר File Descriptors, אבל כן יש דברים אחרים . . .(אור) יש Dependencies, יש Static code analysis . . . עדיין יש הרבה API-ים שחשופים החוצה לעולם, מן הסתם . . .(רן) אני מבין שה-Compliance עוד לא הגיע לשם . . . .(אור) אנחנו מנסים כמה שיותר לדאוג בעצמנו, כי שוב - ה-Compliance חשוב לנו בגלל שזה חשוב לפרטנרים שלנו, זה חשוב לנראותחשוב לנו שלא יקרה לנו שום דבר, לשמור בעצם על כל הלקוחות שלנו, אז יש כאן את האספקט של האם אנחנו מרגישים מספיק אחריות בשביל לעשות את זה.כן . . . .אז בהקשר הזה, השימוש ב-Lambda ובאופן כללי ב-Serverless - אני רוצה רגע להגיד מילה על Serverless - אני תמיד שומע “Serverless, Serverless” . . . כשהתחלנו להתעסק עם זה, אני פחות התעניינתי בזה שזה Serverless, אפילו קראתי לזה הרבה פעמים Management-less . . . .יש Server, הוא קיים - יש Lambda, זה Server, יש Instance, יש לנו Connection ל-Database שאנחנו עושים לו Re-use, יש RAM ואנחנו מחזיקים שם כל מיני דברים, יש CPU . . . . יש הכל.זה מבחינתינו כאילו מתנהג קצת כמו Server שמריץ קוד ב-Check-point-ים - רק שאנחנו כאילו לא מנהלים אותו.אז במדרג, אנחנו כן מסתכלים על זה כי Lambda זה ה-Holy Grail מהבחינה הזו של Management-lessמתחת לזה יש לנו FarGate, יש לנו זה . . . אז אנחנו לא Pure-Serverless - אנחנו משתמשים במה שמתאים לנו באותה נקודת זמן.(רן) איך זה משפיע על חווית הפיתוח? זאת אומרת - אם אני עכשיו בא ומתקן איזשהו Bug ב-Service, שהוא כנראה חלק מ-70 רכיבים אחרים - איך אני מפתח אותו? איך אני בודק אותו?(אור) יפה, אז זה אחד הדברים הראשונים שגם אני חשבתי עליהם כשאמרנו “בואו נעשה Serverless” . . .אז יש לנו כרגע שתי גישות - אחת שהיא קצת יותר Legacy בתוך החברה ואחת שהיא יותר חדשה, שאנחנו ככה מתחילים לעשות לה סוג של Imploy מבפניםהגישה הראשונה, שהיא עדיין עובדת בחלק גדול מה-Service-ים - מה שעשינו איתה בעצם . . . ה-Service-ים עצמם, היה להם מבנה פנימי מאוד ספציפי, הם היו נראים כמו איזשהו Web Application, והייתה איזושהי מעטפת קטנה שסידרה בעצם את כל התשתית מסביב, שהיא כאילו תיקרא ל-Routing בתוך ה-Web Applicationאם זה Event מ-SQS אז הוא מול איזשהו Route עם Fake payload, שזה בעצם ה-Payload מ-SQS, ועוד כל מיני דברים בסגנון הזהאם יש S3 אז הוא מול איזשהו Payload מ-S3ואז זה מאפשר לנו בעצם להריץ את הדבר הזה בתוך Lambda כרגיל, עם Event-ים ו-Listeners והכל . . . (רן) וב-Commit אתה מייצר איזשהו Container שעוטף את זה . . .(אור) אפילו לא Container - הלכנו ממש npm-start . . . פשוט, מה שהיה . . . היו פשוט, בכל פרויקט, היו שני סקריפטיםאחד שמתאים ל-Lambda והשני שהוא Server עם איזושהי מעטפת.כשמפתחים עבדו לוקאלית, אז בעצם הם . . . ה-Service שלהם דיבר ישירות עם ה-Cloud, לא עבדנו עם RabbitMQ לוקאלית ו-SQS ב-Cloud, עם DynamoDB ב-Cloud ועם Redis לוקאליתפשוט הכל - לכל Developer יש תשתית שלמה - “שלד” כזה של התשתית - בלי ה-Computeהוא פשוט בוחר איזשהו Service שהוא רוצה, npm-start - וזה מתחיל “לנגן” מול התשתית, מול ה-SQS הרלוונטי, מול ה-DynamoDB הרלוונטיה-RDS, במקרה הזה MySQL, עדיין לוקאליתזאת הייתה הגישה הראשונה - זה עבד יחסית טוב, רצנו עם זה יחסית הרבה זמן.עכשיו אנחנו נהיינו קצת יותר Powerhouse של -Lambda, ואנחנו עובדים לגישה שהיא קצת שונה - אנחנו עובדים עם SAM היום - SAM זה המתחרה-Serverlss, זה “ה-Serverless.com של AWS“. . . הרעיון זה שהוא מייצר לנו CloudFormation templates, אנחנו עושים לזה Deployments כחבילה שלמה, כ-Stack שלםואז, ברגע שיש לך Stack כזה, של . . . בגדול, לכל מפתחת אצלנו נגיד יש חשבון AWS פרטי, זה כרגע . . . עדיין אנחנו בסוג של נקרא-לזה-POC כדי לבדוק שזה . . . שההתיכנות של זה היא ממש בסדר.לכל מפתחת יש חשבון AWS - בפנים יש בעצם את המיני-Production של Melio - איזה שירות שהיא רוצה להריץ שם, את ה-email Service שלה, גם את ה-Payments Processing, הכל . . . ואז, אם היא רוצה לפתח Lambda מסויימת, אז כתבנו איזשהו כלי משלנו, שבעצם משתלט על ה-Lambda הזאת, ומעביר את ה-Compute אליה למחשבואז היא יכולה לעשות Break-points, לוקאלית - זה רץ ממש על המחשב . . .(רן) כמו Telepresence בעולם של Kubernetes . . . .(אור) בדיוק - רק עם פחות משחקיםפחות משחקים עם Port-ים, פחות משחקים עם Networking - רק לקחת את ה-Message, לשלוח אותו למחשב, לעשות את ה-Compute . . .כי ה-Resources של AWS בכל מקרה זמינים - SQS זמין ב-API Call ו-SNS זמין ב-API Call, אז ה-Compute שרץ לוקאלית על המחשב “מדבר עם ה-Cloud כאילו הוא ב-Cloud”אז ה-Telepresence במובן הזה זה רק להעביר את ה-Messaging למקום הנכון ב . . . נקרא לזה “ב-Network הגלובאלי העולמי”, למחשב הספציפי שבו זה נמצא כרגע.(רן) אז מפתח חדש שמצטרף אליכם - אנחנו כבר לקראת סיום, וזו שאלה אחרונה אולי - מפתח חדש שמצטרף אליכם, שמעולם לא חווה Serverless ולא חווה את ה-Concept - עד כמה, להערכתם, קל או קשה לו להכנס ל-Mindset הנכון, של Serverless, של Stateless, וכו'?(אור) אז אני מודה שזה אתגר . . . אנחנו, ככה, מנסים בתקופת ה-Onboarding של המפתחים והמפתחות, אנחנו מנסים להכניס את זה מעיין ל-Mindset של “אנחנו חיים על Lambda”, עם האתגרים - מה שיבוא, אנחנו נתמודד איתו.בגדול, הגענו למצב שיש כבר הרבה מאוד Engineers שכבר עובדים עם זה, אז ברגע שמישהו מצטרף, יש את ה . . . נקרא לזה תמיכה, ה-Ecosystem הפנימי של החברה שיודע לעזור.אני יכול להגיד שהחבר'ה של ה-Payments Processing מדהימים בקטע הזה - ממש אימצו את זה לגמרי והם הולכים עם זה עד הסוף.גם עם ה-Pitfalls ועם ה-Challenges שיש לזה - הם הולכים עם זה ורצים עם זה קדימה ממש יפה.רציתי לגעת דווקא בנקודה, בהקשר של Serverless, אם יש לנו זמן - בהקשר של Pricing . . .יש איזושהי מנטרה כזאת, ש-”Serverless הרבה יותר יקר” [תלוי . . . 412 Serverless at Via], בגלל שזה בעצם שירות Premium כדי להריץ פונקציה אחת בודדתאנחנו, מה שנקרא, מוצאים - בהשאלה מאנגלית [we find it] - אנחנו מוצאים את זה יחסית - אם לא יותר זול אז מקביל לדברים אחרים.יש לזה כמה סיבות - מן הסתם, אחת הסיבות העיקריות זה שאם לא הרצנו אז אנחנו לא משלמים, אבל באיזשהו מקום . . .(אורי) אין דבר כזה “להשאיר Instance באוויר” . . .(אור) בדיוק - אין Instance באוויר . . . כשהוא כן באוויר זה יקר יותר, אבל רוב הזמן אצלנו הוא לא באוויר.בוא נגיד לא “רוב הזמן”, אני מגזים - אבל חלק גדול מהחודש הוא לא באוויר.ויש פיצ'ר מאוד נחמד, בהקשר הזה, שיחסית מאוד קל לנו לעשות לו מה שנקרא Unit economicsכי בעצם כל Processing אצלנו - אנחנו יודעים כמה הוא עולה, אנחנו יכולים לעשות איזשהו חישוב גס ולדעת כמה בעצם לתרגם- ממש לחשבונית AWS - לתרגם כמה עולה הפעילות העסקית, ואפילו לתת תחזיות על סמך זה.וזה יתרון מאוד גדול בשבילנו.(אורי) זה מחזיר אותי לשאלה שמחכה מההתחלה . . . מה המודל העסקי? זאת אומרת - אתם פר-טרנזקציה? אתם . . .(רן) עושים פרסומות! מה בעיה? . . . (אילן) Recommendations, כן . . .במערכת שלנו, בסופו של דבר, יש שני סוגים של Transactions - יש את מה שאנחנו קוראים לו Basic Transactions, ה-Fundamental - להעביר ACH ל-ACH או ACH לשיק - התשלומים האלה הם חינם, בעצם Engagement Flywheel עבור העסק ועבורנו בעצם - שהעסק ישתמש בנו.הסוג השני של התשלומים זה בעצם Premium Payments - אם עכשיו עסק רוצה להשתמש בכרטיס אשראי - אז לא מתאפשר לו כרטיס אשראי, כי רוב הספקים לא מקבלים אשראי בעולמות ה-B2Bאנחנו, בזכות ה-Decupling, מאפשרים לעסק בעצם לשלם בכרטיס אשראי - והצד השני יקבל שיק.וע”י כך, בעצם לעזור לעסק ולדחות תשלום בעוד 30 או 45 יום ל-Billing cycle הבא שלך, של כרטיסי האשראיהדבר הזה יעלה למשלם 2.9% . . . (רן) “אשראי ישראלי” - שוטף פלוס . . .(אורי) “השיק בדואר” . . . .(אילן) ותשלומים אחרים שהם Premium Services זה אם אני עכשיו בתור . . . אם אני רוצה . . . ACH, לוקח לו שלושה ימים להגיע בין צד אחד לצד אחר, זו המערכת הבנקאית בארה”ב [גם בארץ…]אם עכשיו רוצים שהתשלום יגיע באותו יום, או Instant - אז בעצם זו עלות שאחד הצדדים יכול לספוג בינתיים - מי שרוצה להאיץ את התשלום או לקבל יותר מהר את התשלום.(רן) דרך אגב, גם במערכת הבנקאית - אני מניח שאתה מכיר את זה - יש גם אפשרות לזרז את התשלום תמורת “תשלום סמלי” . . .(אילן) בדיוק - International payouts -אנחנו היום נכנסים לתשלומים בינלאומיים - ותשלומים כאלה עולים כסף, Domestic wire.אז אנחנו נותנים את התשלומים, את ה-Fundamental payments, בחינם - אבל התשלומים היותר Premium הם בעצם עולים, לאחד הצדדים, תלוי למי אתה מוכר אותו.משם מגיעים ה-Unit Economics שלנו.(אורי) אבל יש, נקרא לזה “הלימה”, בין כמות הטרנזקציות שאתם תבצעו - תכל'ס תשמשו ב-Lambda-ות, נכון? - לבין כמה כסף שתרוויחו, זאת אומרת - זה יחס ישר, מסויים, אבל . . .(אילן) זה לגמרי ככה . . . בסופו של יום, כשאנחנו בעצם מודדים, אנחנו מסתכלים בעצם על סך כל ה-Volume ש-Melio הוציאה באותו יום או באותו חודש - וכמה מה-Volume הזה הוא בעצם volume ש-Melio קיבלה עליו Revenueויש לנו איזשהו יעד שאנחנו באים ואומרים - “רגע, מה היחס?”אם מסתכלים, נגיד, על Check out באונליין, בוא נניח על Check out ב-Stripe - בסופו של יום, כש-Stripe מסתכלת על 100% מהטרנזקציות, הם מרוויחים רווח כזה או אחר, 2.5% או Whatever.אז ב-Melio זה עובד קצת אחרת, בגלל שיש Blend - יש Blend של תשלומים שהם בחינם ותשלומים שהם עולים, ש-Melio בעצם מקבלת עליהם Revenue.כשמסתכלים על הכל, אז יש לנו איזשהו יעד של כמה “Bips-ים” בעצם מסך כל ה-TPV הוא בעצם רווח או Revenue ל-Melio(רן) תרגם שנייה . . . Bips זה?(אילן) זה בעצם האחוזים שבעצם עליהם אנחנו . . .(אורי) זה רווח . . .(אילן) זה הרווח . . . זה ה-Revenue[בערך . . Basis points (BPS) refers to a common unit of measure for interest rates and other percentages in finance. One basis point is equal to 1/100th of 1%]וזה בעצם יעד שאנחנו מסתכלים עליו כל הזמןויש הלימה, בדיוק כמו שאמרת, אורי - בעצם, זה שאנחנו רואים שעסק משתמש בנו יותר, או מבצע יותר תשלומים, אז כמות ה-Premium Payments היחסית שקוראת שם בעצם עולה.ולכן אנחנו באים, וזה עדיין כלכלי עבורנו לבוא ולהגיע למצב שאנחנו רוצים שה-Engegement יעלה - כי אנחנו יודעים שאפשר אחרי זה To derive more revenue.(רן) הנושא הזה, של Unit Economy, אני לגמרי מזדהה איתו - אני נמצא גם במקום שמאוד קשה להבין כמה דברים עולים ואני יודע שזה משמעותי - אבל אני תוהה עד כמה זה בכלל זה משמעותי, עלות מרכיב הענן אצלכם היום - זה בכלל משהו משמעותי? אתם בכלל שמים לב אליו בשלב הזה של הגדילה?(אורי) . . . כאחוז מה-Revenue, ה-Cost of Sales . . .(אילן) בוא נגיד ככה, אם אני יכול ככה “לשתף ולא לשתף”, מה שנקרא . . .(רן) אם המשקיעים לא מקשיבים . . . [אבל אולי קוראים?](אילן) יש לנו עלויות עסקיות, שהן לא עלויות של הענן, בעצם - העלויות מול הבנקים, מול השותפים “הטבעיים”, נקרא לזהוכשמסתכלים על התמונה הכוללת, כשכוללים בפנים את העלות של הענן - אז זה לא כל כך מפחיד.(רן) זה בסדר, ואני חושב שהרבה חברות נמצאות במקום כזה, בעיקר בשלב של גדילה, שבו יש עלויות הרבה הרבה יותר משמעותיות - והן במכוון “שופכות כסף” על הענן, נקרא לזה.הבעיה שהן אחר כך מגיעות לנקודה שממנה מאוד קשה לחזור, של “אוקיי, עכשיו אני רוצה לצמצם את עלויות הענן - אבל עכשיו זה כבר ממש ממש קשה”.[השלמות למיטבי שמע - 421 The Cost of Cloud, a Trillion Dollar Paradox with Martin Casado ו - 418 Carboretor 31 Cost of cloud paradox](אילן) אז אני אהיה איתך כנה - זה שיקול מאוד . . . זה שיקול שעובר לנו גם.בסופו של יום, כשאמרנו שאנחנו רוצים להיות Management-less, אנחנו מעדיפים להתרכז ב-Core Businessכי Melio זו חברה שגדלה - גדלה וגדלה מאוד מהר - בשנה האחרונה הגדלנו את נפח הפעילות ב-5000% אחוז . . . ה-Covid, הקורונה, נתנה Boost מאוד גדול לעסקים להיפטר ממשהו פיזי או לפגוש אחד את השני כדי לבצע תשלומים ולעבור לתשלומים Online.דרך אגב - ה-Serverless או ה-Lambda-ות עזרו לנו To scale out בצורה מאוד טובה - מראש בנינו את המערכת שנוכל To Scale out בצורה טובה, וזה עזר לנו בגדילה הבאמת מאוד מהירה שקרתה לנו.אבל לנקודה שלך - כן, אנחנו הרבה יותר מפוקסים ביכולת שלנו להגדיל את ה-Business מאשר ללכת ולהבין איך אנחנו נחסוך בעלויות עיבוד.(רן) אבל עושים הכנה למזגן? זאת אומרת - מתישהו תתקינו את המזגן הזה . . . .(אילן) לגמרי . . . בחברות Payments זה נהוג להבין בעצם “כמה עולה תשלום”כשאני מסתכל שנייה רגע על . . . Melio בעצם ביצעה מיליונים של טרנזקציות - מה העלות הכוללת שלי, מתהליך העיבוד, עלויות שותפים - Per-Transactionהיכולת לחשב את זה היא יכולת מאוד חשובה כדי להביא את ה-Business to Scale(רן) אז הזכרנו שאתם גדלים - לא אמרתם איפה אתם גרים . . . איפה המשרד?(אילן) המשרד שלנו נמצא בתל אביב, ברחוב הארבעה, מגדלי הארבעהמאוד נגיש מבחינת “קרוב לרכבת” - מאוד נגיש למי שנמצא מחוץ לתל אביב, מאוד נגיש למי שבתוך תל אביבמשרדים יפים, חדשים, שתי קומות - וגדלים . . . (רן) מה אתם מחפשים היום?(אילן) היום ה-Engineering ב-Melio הוא כ-80 אנשים, שנמצאים בארבע קבוצות - אנחנו רוצים להכפיל את גודל הקבוצה, את קבוצת ה-Engineering בשנה הקרובה . . .מחפשים קצת “הכל מהכל” - מחפשים Full-Stack Engineers, יותר לצוותים שהם Product-facing, שמתעסקים בחווייה - לא דיברנו על זה הרבה היום, אבל יש חווייה - אחד הדברים, ואור הזכיר את זה קצת, דיברנו בעיקר על ה-Payments Processing, אבל בסופו של יום אנחנו מוכרים חווייה - חווייה שתיהיה מאוד מאוד נוחה ופשוטה לבעל עסק קטן כדי לנהל את התשלומים שלואז יש צוותים שהם Product-facing שהם בעיקר Full-Stack Engineers.מחפשים Data Science - כי -Melio עושה את כל ה-Risk של ה-Payments, כי Risk “לא קיים” בכל עולמות ה-B2B כמשהו שהוא off the shelf, אז היינו צריכים לפתח את כל המודלים בעצמנואז גם Big Data Engineers וגם Data Science לקבוצות של ה-Risk וה-Data.ו-Backend engineers ל-Payment Processing, שדיברנו עליו עכשיו . . . (רן) יופי - אז שיהיה בהצלחה, תודה רבה על הביקור, השיק בדואר, להתראות! האזנה נעימה ותודה רבה לעופר פורר על התמלול!
Links: Let's Encrypt's root certificate has expired, and it might break your devices: https://techcrunch.com/2021/09/21/lets-encrypt-root-expiry/ Slack was bitten by DNSSEC: https://Twitter.com/tqbf/status/1443654964556013569 Prepare For Cybersecurity Assessments From Your Customers: https://www.securitysystemsnews.com/article/prepare-for-cybersecurity-assessments-from-your-customers AWS Lambda now supports triggering Lambda functions from an Amazon SQS queue in a different account: https://aws.amazon.com/about-aws/whats-new/2021/09/aws-lambda-lambda-function-amazon-sqs-queue/ Migrating custom Landing Zone with RAM to AWS Control Tower: https://aws.amazon.com/blogs/mt/migrating-custom-landing-zone-with-ram-to-aws-control-tower/ Introducing the Ransomware Risk Management on AWS Whitepaper: https://aws.amazon.com/blogs/security/introducing-the-ransomware-risk-management-on-aws-whitepaper/ Validate IAM policies in CloudFormation templates using IAM Access Analyzer: https://aws.amazon.com/blogs/security/validate-iam-policies-in-cloudformation-templates-using-iam-access-analyzer/ Pacu: The Open Source AWS Exploitation Framework: https://rhinosecuritylabs.com/aws/pacu-open-source-aws-exploitation-framework/ TranscriptCorey: This is the AWS Morning Brief: Security Edition. AWS is fond of saying security is job zero. That means it's nobody in particular's job, which means it falls to the rest of us. Just the news you need to know, none of the fluff.Corey: This episode is sponsored in part by Thinkst Canary. This might take a little bit to explain, so bear with me. I linked against an early version of their tool, canarytokens.org, in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, or anything else like that that you can generate in various parts of your environment, wherever you want them to live. It gives you fake AWS API credentials, for example, and the only thing that these things do is alert you whenever someone attempts to use them. It's an awesome approach to detecting breaches. I've used something similar for years myself before I found them. Check them out. But wait, there's more because they also have an enterprise option that you should be very much aware of: canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment and manage them centrally. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files that it presents on a fake file store, you get instant alerts. It's awesome. If you don't do something like this, instead you're likely to find out that you've gotten breached the very hard way. So, check it out. It's one of those few things that I look at and say, “Wow, that is an amazing idea. I am so glad I found them. I love it.” Again, those URLs are canarytokens.org and canary.tools. And the first one is free because of course it is. The second one is enterprise-y. You'll know which one of those you fall into. Take a look. I'm a big fan. More to come from Thinkst Canary in the weeks ahead.Corey: Somehow we made it through an entire week without a major vendor having a headline-level security breach. You know, I could get used to this; I'll take, “It's harder for me to figure out what to talk about here,” over, “A bunch of customers are scrambling because their providers have failed them,” every time.So, let's see what the community had to say. Last week, as you're probably aware, Let's Encrypt's root certificate expiredwhich caused pain for a bunch of folks. Any device or configuration that hadn't been updated for a few years is potentially going to see things breaking. The lesson here is to be aware that certificates do expire. The antipattern is to do super-long registrations for thing, but that just makes it worse.One of the things Let's Encrypt got very right is forcing 90-day certificate rotations for client certs. When you've got to do that every three months, you know where all of your certificates are. If you've got to replace it once every ten years, you'll have no clue; that was six employees ago.In bad week news, Slack was bitten by DNSSEC when they attempted and failed to roll it out. DNSSEC is a bag of pain it's best not to bother with, as a general rule. DNS is always a bag of pain because of caching and TTL issues. In effect, Slack tried to roll out DNSSEC—probably due to a demand by some big corporate customer—had it fail, panicked and rolled back the change, and was in turn bitten by outages as a bunch of DNS resolvers had the DS key cached, but the authoritative nameservers stopped publishing it. This is a mess and a great warning to those of us who might naively assume that anything like DNSSEC that offers improved security comes without severe tradeoffs. Measure twice, cut once because mistakes are going to show.I also found a somewhat alarmist article talking about cybersecurity assessments from your customers and fine, but it brings up a good point. If you're somehow responsible for security but don't have security in your job title—which, you know, this show is aimed at—you may one day be surprised to have someone from sales pop up and ask you to fill out a form from a prospective customer. Ignore the alarm and the panic but you're going to want to get towards something approaching standardization around how you handle those.The first time you get one of these, it's a novel exercise; by the tenth, you just want to have a prepared statement you can hand them so you can move on with things. Well, those prepared statements are often called things like, “SOC 2 certifications.” There's a spectrum and where you fall on it depends upon who you work for and what you do. So, take them seriously and don't be surprised when you get one.AWS had a few interesting security-related announcements. AWS Lambda now supports triggering Lambda functions from an Amazon SQS queue in a different account. That doesn't sound like a security announcement, so why am I talking about it? Because until recently, it wasn't possible so a lot of folks scoped their IAM policies very broadly; what do you care if any random SQS queue in your own account can invoke a Lambda? With this change, suddenly internet randos can invoke Lambda functions, and you should probably go check production immediately.Announcer: Have you implemented industry best practices for securely accessing SSH servers, databases, or Kubernetes? It takes time and expertise to set up. Teleport makes it easy. It is an identity-aware access proxy that brings automatically expiring credentials for everything you need, including role-based access controls, access requests, and the audit log. It helps prevent data exfiltration and helps implement PCI and FedRAMP compliance. And best of all, Teleport is open-source and a pleasure to use. Download Teleport at goteleport.com. That's goteleport.com.Corey: Migrating custom Landing Zone with RAM to AWS Control Tower. It's worth considering the concept here because, “Using the polished thing” is usually better than building and then maintaining something yourself. You wind up off in the wilderness; then AWS shows up and acts befuddled, “Why on earth would you build things the way that we told you to build them at the time you set up your environment?” It's obnoxious and they need to stop talking and own their mistakes, but keeping things current with the accepted way of doing things is usually worth at least considering.AWS has a whitepaper on Ransomware Risk Management out and I'm honestly conflicted about it. There are gems but it talks about a pile of different services they offer to offset the risk. Some of them—like AWS Backup—are great.Others—“Use Systems Manager State Manager”—present as product pitches for products of varying quality and low adoption. On balance, it's worth reading but retain a healthy skepticism if you do. It should be noted that the points that the address and the framework they lay out is exactly how risk management folks think, and that's helpful.Validate IAM policies in CloudFormation templates using IAM Access Analyzer. I like that one quite a bit. It does what it says on the tin, and applies a bunch of more advanced linting rules than you'd find in something like cfn-lint.Note that this costs nothing for a change, even though it does communicate with AWS to run its analysis. Note that as AWS improves the Access Analyzer, findings will likely change, so be aware that this may well result in a regression should you have it installed as part of a CI/CD pipeline.And as far as tools go, if you're not a security researcher, good; you're in the right place. But that said, if you have a spare afternoon at some point, you may want to check out Pacu—that's P-A-C-U. It's an open-source AWS exploitation framework that lets you see just how insecure your AWS accounts might be. I generally leave playing with those sorts of things to security professionals, but this is a fun way to just take a quick check and see if there's a burning fire that jumps out that might arise for you down the road. And I'll talk to you more about all this stuff next week.Corey: I have been your host, Corey Quinn, and if you remember nothing else, it's that when you don't get what you want, you get experience instead. Let my experience guide you with the things you need to know in the AWS security world, so you can get back to doing your actual job. Thank you for listening to the AWS Morning Brief: Security Edition.Announcer: This has been a HumblePod production. Stay humble.
In this episode, I spoke with Ben Runchey, one of our solutions engineers here at AxonIQ. In his previous project, Ben went from using various tools, such as MongoDB, RabbitMQ, SNS, and SQS, to using Axon Framework and Axon Server as the messaging platform, event store, and more. We talked about the challenges and the benefits of making this decision. We also discussed scalability, monitoring, planning, education, and much more. Ben and I also briefly talked about monitoring with Prometheus and Grafana for Axon Server that our colleagues, Stefan Dragisic, worked on and the benefits of using it. You can find out more about it here. Connect with Ben on Linkedin. Sara on LinkedIn Connect with Sara on Linkedin and Twitter. For more information about us visit axoniq.io
About EricEric Dynowski, Managing Partner and Chief Solutions Officer at Deft, has been developing software, designing global infrastructures, and managing large technology installations for over 20 years. His background in complex infrastructure design and integration has helped him reduce customer budgets by millions.Links: Deft: https://www.deft.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: You could build you go ahead and build your own coding and mapping notification system, but it takes time, and it sucks! Alternately, consider Courier, who is sponsoring this episode. They make it easy. You can call a single send API for all of your notifications and channels. You can control the complexity around routing, retries, and deliverability and simplify your notification sequences with automation rules. Visit courier.com today and get started for free. If you wind up talking to them, tell them I sent you and watch them wince—because everyone does when you bring up my name. Thats the glorious part of being me. Once again, you could build your own notification system but why on god's flat earth would you do that? Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It's an awesome approach. I've used something similar for years. Check them out. But wait, there's more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It's awesome. If you don't do something like this, you're likely to find out that you've gotten breached, the hard way. Take a look at this. It's one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That's canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I'm a big fan of this. More from them in the coming weeks.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. For a while I've been talking about how I started The Duckbill Group as a highly niche, highly focused consultancy aimed at one very expensive problem—the AWS bill—and that's all we do. We don't do implementation; we simply do the thing that it says on the tin. And it turns out that you can get fairly good at the problem like that in not a tremendous amount of time.And today's promoted episode of Screaming in the Cloud. My guest is Eric Dynowski, Chief Solutions Officer and Partner at Deft, which is a consulting company that is almost inverted, in the sense that they do an awful lot of stuff. And we're going to argue about it now. Eric, thank you for taking the time to speak with me today.Eric: Great to be here, Corey. Can't wait to dig in. [laugh].Corey: So, when I started this place back almost five years ago now, I was coming out of an engineering career that I found deeply unsatisfying. I had a bunch of working with computers skill sets, and I wanted to apply them to something that I could use as an independent consultant—because who would ever build a team or a company out of this?—that I could wind up doing repeatedly, so I could, you know, make money, not have a boss, and ideally work less than 80 hours a week. And fixing the AWS bill was the first one I tried; it turned out that, yeah, there is, in fact, an awful lot of expensive business problem hidden in there. And that's what I've been focusing on ever since. I get the distinct impression that your story doesn't quite sound like that one.Eric: Well, it's a little bit like that one, in the sense that I was once an engineer and a technician doing things, and had the crazy idea to go off and start a company that consulted and helped people with their technology needs. And maybe we're a little bit alike in the sense that we also have a very singular focused offering—I'm going to tease you a little bit here, Corey—is that we do one thing: we provide technology solutions for our customers. But probably what you are getting at is, “Well, that's a really broad statement, Eric, what is a technology solution?”Corey: It is. And the reason I did this is because my marketing budget was 50 bucks. So, I wanted something that the Rolodex effect is what I was after, where when someone says at a party to someone else, “Well, I have a problem: my AWS bill.” I want the answer to be, “Hey, I have someone for you to talk to.” You don't generally tend to see that with more broad statements around positioning most of the time.Eric: Sure. Sure.Corey: I also felt like if I was going to be, all right, I'm the best cloud architect advisor ever. Great, now I'm competing against folks like you and the giant consultancies that are in every country. And honestly, those folks have better airport ads than I'm going to be able to put up, at least at the time. Now, I have a platypus for a mascot. So, one wonders whether that would still hold true. You took a very different approach and have done fantastically—Eric: Yeah.Corey: —well with it.Eric: Yeah, I mean, maybe a little bit of history here is helpful. When I started my company, which was Turing Group back in 2013, we did actually focus pretty tightly just on AWS, and that's all we did. We wanted to help customers get in AWS, fix their problems in AWS, scale in AWS, manage their costs, all these sorts of things. And along the way, we had customers coming back to us saying, “We love you guys. You're doing great in [laugh] AWS, but I have other needs too.”And I started saying, “Well, I know a guy over there that does that.” And, “I've got a good friend over here at this company that does it.” And, you know, we're referring back and forth. And kind of parallel to that, one of my business partners who is running a company called Server Central was having the exact inverse problem. And, you know, they were providing managed services within the data center, managed networking capability, things of that nature, you know, helping customers build out an infrastructure to operate at scale, where it's like, “Oh, we need 1000 servers, and we need them really inexpensive, and we have to be able to manage them at scale.”And they were crushing it, doing a great job except his customer start getting back to him saying, “Love what you guys are doing. We're also using AWS, and we want some help with that.” And so, he would pick up the phone and call me and say, “Eric, can you guys help here?” And in some sense, we were competitors. I wanted to move everybody out of the data center into the cloud.And he wanted to move everyone out of the cloud back into the data center or keep them in the data center. And it was like, “Okay, this is weird, but we both have the same problem.” And so we went out to lunch and started this conversation of like, “Well, what if we weren't competitors?” [laugh].Corey: “Maybe there's alignment here.” Yeah. I think Ben Franklin once said that three moves is as good as a fire when it comes to cleaning out old cruft. And migrations are like that. I do want to call out that since I make a frequent practice of saying that multi-cloud as a best practice is foolish, I want to be clear that is in the absence of other constraints.If you're building something new, you probably should pick a provider and go all-in. I don't care which one you do—you might; I don't—but beyond that, at a significant point of scale, when a company says, “All right, we're in one provider—or a data center—and we're going to move to a cloud or other clouds.” Generally, they're correct. They have context that I don't when I'm speaking in the general case. I am not anti-multi-cloud; I am anti-multi-cloud when it is foolish and when it's badly done. Just to set my bias out there and, ideally, avoid getting some letters.Eric: You know, I tend not to disagree with that in the right context as well. When we were doing just AWS only, I think I would have argued that multi-cloud, yeah, there's no place for it. If you're going multi-cloud, you're giving up on all of the greatness that a single public cloud has to offer. You know, and by the greatness, I mean, the proprietary services they offer, and the APIs, and things like SNS and SQS and Route 53, all of those things that you could build into your application and just start using them without having to build all the infrastructure to run it. And so I would agree with you, I think, in that sense.But I wish the world were that simple and I wish the companies that we worked with operated in a nice, clean, unambiguous context. But the more you dig in, I think you realize that when you start dealing with a company, maybe, that has 20,000 employees and offices all across the country, and 25 years of legacy applications—or maybe a vision for the future that just is so massive that it requires a different point of view—and this is really where we engage. And it's interesting that you mentioned about the context, and usually, if a customer approaches us and says, “We want to go multi-cloud,” the first thing we do is put the brakes on and get away from the discussion around multi-cloud and move straight into, “What are you trying to accomplish here?” And navigate that conversation before it turns into—you know, let's not start with the solution type of discussion.Corey: Part of the problem, it seems, that when you start talking to folks about these things, especially in some vendor corners, an awful lot of self-interest that winds up informing the answers that immediately come from it, where it's, “Oh, yeah, you want to go multi-cloud.” And you scratch underneath the surface and the reason is that if you go all-in on one provider, they have nothing left to sell you, in some cases. Other times, it's by a cloud provider themselves pushing multi-cloud strongly because they know that if you go all-in on one provider, it will absolutely not be them. So, the hard part is finding someone who can serve as a trusted advisor. And I mean that in the actual sense of a person, not the crappy AWS service that tells you everything's fine when it isn't. That's ‘Plausible Advisor' at best. Let's be clear here.Eric: Well, come on. If it wasn't for Trusted Advisor, we wouldn't have a market for all the third-party analysis tools [laugh] and people like you to help us manage our costs.Corey: Believe me, if I thought a tool could solve the problem, I would have built it years ago. Tried, turned out it didn't, and well, here we are. You position what you do at Deft as starting from an advisory perspective. You're not pushing a particular product, you're not pushing any particular vendors that I'm aware of, you have a partner list that is not a single vendor, what a concept. So, it's clear that you're doing something that goes one of two ways.Either it is, yeah, we'll take money from anyone who will pay us, or alternately you're approaching it from a thoughtful perspective of trying to figure out what's going on with the customer. Based upon our conversations, I'm going to go ahead and guess it's that one.Eric: It definitely is the latter, Corey. We've certainly had many customers approach us over the years and ask for things that are a bad fit. And a bad fit might be, they're asking for technology experience we don't have. If someone came to us and said, “We want you guys to be Oracle DBAs because you do technology,” our answer is very likely going to be no. Could we learn to be Oracle DBAs? Yeah, we probably could. Do we want to probably not? Maybe?Corey: There are some very qualified Oracle DBAs out in the world, and it's—Eric: There's—Corey: —great.Eric: —there's places for people to specialize. And I think one of our virtues is to know and recognize where we belong and where we don't belong. And the good thing is, not only have we sort of built up our own partner list in terms of technology partners, strategic partners, but we also have our own internal list of referral partners where we know that something's out of our wheelhouse, and I got another company and another team here that I know can crush it and help them out. There's other areas of work that we just don't get into. If you want to outsource your IT and have a company that's going to help you figure out why your printer is not working, definitely not us.You're going to be wasting your money with a firm like us. Or maybe you want to partner in a way that isn't going to take the best advantage of the capabilities that we have, meaning you just want to take advantage of us in a halfway manner, and you want to keep an internal team and the two teams are up against one another and fighting about stuff constantly. We need to have good strong trust between our two companies and our partnerships. And so if we feel like there isn't an opportunity for that, we might walk away from it as well.So, it's not a case of everything that walks in front of us, here's a proposal. [laugh]. We definitely do some opportunity vetting and analysis. We ask a lot of questions upfront about how our customers work, what their internal teams are like, what their expectations of us are, what they want in that relationship. Is it transactional or is it strategic? We're interested in the strategic partnerships with our customers.Corey: I think this is something that is not well understood by a lot of the fly-by-night folks, for lack of a better term. I don't mean to sound disparaging, but the folks who don't seem to understand that long-term reputation is important. I mean, both of our consulting companies, although radically different in focus, have pages on our site where we list reference customers. In fact, there's some overlap between our customers. And as we look at this, you aren't allowed to put a customer logo up if they're going to take umbrage to you doing it, first off. And secondly, you don't want to put a customer logo up if people are going to ask them about their experience with you, and the response is, “Oh, they were crap.” At some point, no, let's not do that.Eric: [laugh]. That's right.Corey: The only way to get there is to deliver on an engagement in such a way that the response is, “That was great. Would you do it again?” “Can I?” And the idea of excitement of delivering an outcome where people who you've worked with become some of your biggest advocates, that's how I always viewed the proper way of building a business.Eric: Yeah. Now, I'd ask you, in terms of when you guys are providing advice to your customers about AWS spend, or someone approaches you, obviously, you're probably first thinking, “Okay, well, how much AWS spend do you have in a month and is this worth my time?” But there's probably another element of evaluation that you must do in terms of is this a good customer for us, and can we do the right thing for them? What are pieces that you guys think about?Corey: Oh, absolutely. As a general rule, we do a lot of AWS contract negotiation. And that is, if you have an AWS offer in front of you for committing to something, come talk to us; it's fun. That's half of our engagements today. The other half are cost optimization projects, and generally speaking, we aren't going to be able to effectively deliver return on investment for much less than about a million bucks a month in spend.So, I do at some point want to explore how to help people who are not already paying a king's ransom to AWS every month, but that is down the road. The next step is a conversation. It's a, “So great, you want to optimize your AWS bill.” And then my favorite is the—I get the quote-unquote, “Dumb question.” “Why? Why do you care? Why is this an actual problem other than it looks like a phone number and your CFO has some questions, what is the actual concern?”Very often, we'll find that it's not that you're spending too much on cloud, in many cases, it's that it's not understood what it's doing. “Okay, the bill is 20% higher this month. Is that new normal? Is that something that's going to inform our planning and we do adjust our expectations for what this is going to cost to run this? Is it just a mistake that someone left up?” The same questions would have arisen if the bill were 20% lower, except somehow it never is.Eric: Right, right. You know what's interesting about that, Corey is, too, also I think that you tend to tease AWS from time to time. And also I think the work you do would not be in Amazon's interest, right? They want customers to spend more money on their infrastructure and their services and their capabilities, and you're helping customers spend less. But what's interesting about that is that we're one of the few managed service partners in the country.And I don't know how much you know about that program, and what it takes to get into that program and to maintain the certification in that program. It's like a three-day audit; there's 500 control items that we have to go through. In fact, it was that program that took our business to the next level. It was that program and its rigor that took what we were doing and actually matured it and turned it into what I would consider a respectable world-class operation. But one of the interesting aspects of that audit is that there's several control items in there that ask us to show Amazon that we are taking steps to manage our customers' costs on AWS and reduce spend. And it's interesting that Amazon is the one pushing that on us and instilling that requirement as we support our customers in Amazon.Corey: It's counterintuitive, but this is one of those areas where there's no one on the other side of this issue. Of course, Amazon wants customers to spend more money with Amazon—I swear the company spends half its time lying awake at night worrying someone who isn't them is making money somewhere, at least that's how it feels some days. But they want that spend to be intelligent. They don't want the narrative to be that the cloud is just as expensive a bunch of nonsense. If there's a bunch of instances that are sitting there idle, they will advise you—if they're on top of the game—to turn them off because that is the goal. They want it to be—Eric: That's how they deliver on their promise. Right.Corey: Well, yeah. Pandemic aside, with most of our customers, what we notice a year after an engagement is that they're in fact spending more than they were when we started. But it's more efficient; it's growth that's tying into this. It becomes a component of cost of goods sold where, “Yeah, we're doing more business, so it costs us more to fulfill that business; we're perfectly happy,” is generally the response to that. And I think that everyone with a vision that extends beyond this quarter's numbers is likely to start to get into that, on some level.Eric: Right.Corey: One thing I do want to ask you is—relevant to what you just said—one thing that we do at The Duckbill Group explicitly is we have no partners, full stop. And the reason behind that is because with what we're doing around billing, and money, and contract negotiation, and the rest, as soon as we have a partner, it suddenly gives rise to a bunch of real or perceived conflicts of interest. And in this particular niche, it makes an awful lot of sense not to do that. Now, if you're in any other arena, where you're in—“Oh, you're in security, for example. Oh, we have no partners with any vendors,” the answer question becomes, “Well, what's wrong with you? What, there's no one willing to trust you? Do you think somehow you're better than all these other people?” It's the wrong answer. So, my question for you is, how do you evaluate whether you should partner with a particular company or not?Eric: Sure. Great question. Deft's reason for existence—when we think about ourselves, we reframe it as our purpose—is to deliver on the promise of technology. And if you unpack that statement a little bit is like, “Well, what the heck is the promise of technology? What does that even mean?”And what we get down to is that technology itself doesn't make any kind of promises. That router you just bought, it doesn't promise really anything; that EC2 instance you just booted in AWS doesn't really ultimately, at the end of the day, promise anything. It's incapable of making promises, but people are. And what we promise is that we can wrangle that technology, we can configure it, we can set it up in a particular kind of way, we can bring in the right components into the solution, and deliver on a promise of, “Yes, you can scale to 100 million users,” or, “Yes, you can reduce latency and improve the customer experience for your customers.” It's all about the people, and that's what we have the most of.And that's the best thing that we have in our house.s we have an inventory of highly qualified, talented, empathetic, compassionate, excited people. So, when we start thinking about our partners and who we want to partner with, what we take into consideration is what technology, tools, and capabilities do our people need to have in their toolbox, such that when we start working with our customers crafting that promise and that solution, we've got the right things at hand and at the right time. And then the second piece of it is, does the partner align well with us in terms of our vision? And in some sense, keeping us relatively technology-neutral, in the same sense that you're trying to stay neutral from that billing perspective and making sure that you're looking out and advocating for your customers first.So, when we're thinking about our partners as well, it's not that, oh, well, we want to build our whole business on top of AWS, or Azure, or in our data center. And those are the on—you know, we try to remove dogma from the picture in that sense, and try to probably be dogmatic mostly about the customer and what it is that they're trying to achieve, and being honest with them. So, it's more of a, “Hey, let's scan the horizon. Let's listen to our customers, let's understand what problems they're trying to solve, what challenges they have today. Let's evaluate the technology options on the table across the world.” Our partner might be [unintelligible 00:18:36], it might be VM, or it might be Amazon, it might be a small little company somewhere that does a niche service. But our job is to come together with all of those things and present a cohesive solution.Corey: And it's clearly working. You were the Turing Group and you wound up partnering with—you said your business partner—who was over at Server Central, which I'm just guessing from the name and assuming I hadn't paid attention to the industry for a while, sounds like it might not be fully cloud-focused, on some level, given the name. What did they do? And why was merging the right answer?Eric: Yeah. I mean, it's funny that you bring that up. Server Central. Wow, a server; who's talking about servers anymore, right? It's containers and virtualization and—Corey: But they've got to run somewhere.Eric: That's right. Serverless applications. Like, hmm, is this the right name? And I think it speaks to the 20-year heritage that Server Central has had and how they built their business. And they do and did, and we do have a cloud focus that's not related to the public cloud.We have a significant number of customers that operate on private clouds that we've built for them and manage for them, for various reasons. Some are legit and some maybe not so legit and mostly about how they feel about something. And some of them are technically driven. And after we brought the companies together, we realized that hey, you know what, we have a lot of brand equity and history and Server Central and we have to respect that. Turing Group had its own set of brand equity in the market that we had established and promoted a certain kind of ideology and thinking.And so for a short period of time, we were a little bit unsure of how are we going to bring this together in any kind of cohesive fashion? And it actually went out into the world for about a year as Server Central Turing Group. And I think my tongue twisted as I said it, [laugh]. It's a lot of words and it mostly just confuses people and makes them scratch their head. And so we went off on a journey to figure out what our reimagined new company is going to look like with all the combined services and capabilities that we have.And that's how we arrived at Deft and the idea that's how we want to engage with our customers; that's what we want our solutions to be like; that's what we want the experience in working with us to be. And we want to remove the friction and anxiety that technology can bring. It reminds me of when I was starting my first company. I spent a lot of time sort of navel-gazing, saying, “What do I like about this, and why am I doing this?” And went back to my early days as an engineer, and at the core of it was a really simple idea and it was the idea that when I helped somebody with a technology problem, they were elated. They thought it was magic. They thought it was black magic.They didn't understand how I took this goofy, strange, cryptic thing and made it do what they wanted, and I did it quickly and I did it deftly. And there was joy and they were happy and I loved that; I loved that response. I loved knowing that I helped fix this mysterious problem for somebody that just didn't even know where to begin. And I did it time and time again and it helped me grow my career. And when we started the first company, it was sort of like, I want to continue that feeling.I want to create that feeling for our customers where they feel like maybe they're stuck with some crazy complex technology problem, and because I happen to have the innate skill for understanding these things and figuring these things out and I have a team that can do it, we can create that same feeling for our customers. And we want to continue doing that today.Corey: This episode is sponsored by our friends at Oracle HeatWave is a new high-performance accelerator for the Oracle MySQL Database Service. Although I insist on calling it “my squirrel.” While MySQL has long been the worlds most popular open source database, shifting from transacting to analytics required way too much overhead and, ya know, work. With HeatWave you can run your OLTP and OLAP, don't ask me to ever say those acronyms again, workloads directly from your MySQL database and eliminate the time consuming data movement and integration work, while also performing 1100X faster than Amazon Aurora, and 2.5X faster than Amazon Redshift, at a third of the cost. My thanks again to Oracle Cloud for sponsoring this ridiculous nonsense. Corey: I do want to point out that, at least in my mind, there's always a little bit of, I guess, we call it technical elitism, on some level, where, “Oh, someone is working through a partner. They must be a company that's stuck in the past.” But a glance at companies that you're working with, make it very clear that's not the case. I mean, Ars Technica, New Relic, Wildbit. You've got some companies that are very forward-looking, and by no definition are these companies that don't understand cloud or understand how the internet works. It's something that I think is not fully understood among a subset of the industry that, in many cases, having a third-party partner, in many cases winds up helping you go faster, further.Eric: Yeah. It might be cliche to say—oftentimes, in cliches, there's a little bit of truth—which is, focus on what you're good at, and focus on what you're best at, and focus on your core products. And with a lot of these technology companies where you might read on the surface that, “Oh, yeah, they're a smart bunch over there. Why do they need a partner?” Technology is complicated. The stack is deep.Whether you're talking about deployment pipelines, or should I use a fiber connection on this or should I use copper? Or should we have jumbo frames enabled? Or should we be using API Gateway and Lambda functions for this? I just listed a broad range of technologies and things that solve different problems. And these customers have their own products that they have to put out into the world; those products need to be meaningful and thoughtful and aligned with their customers.And because that technology stack is complex and deep, it creates an opportunity for companies like ours—for partners—to step in, and grab a piece of that complexity, and manage it, and handle it, and help a customer with it to create the space for them to create the most excellent product. And so even though they are technology companies because you're managing this big wrangling layered technologies, abstractions, and—well, even when we talk about containerization, right, and running a small application, there's seven layers before you get to the CPU. [laugh]. And within that seven layers, there's I don't know how many lines of code, and there's how many hidden assumptions and configuration files, and you name it. And there's areas of that entire stack that we're really good at and customers derive value from that.Corey: One area that you've been relatively active within is the hybrid universe. My talking point on that has generally looked a lot like the snarky take of, “Well, you have a company in a data center today, and they're going to go all-in on the cloud. And it turns out halfway through that it's hard to move some workloads. There is no AWS/400 and they have a mainframe.” So, what are they going to do? They give up halfway, plant a flag, declare victory, and now we're hybrid as a best practice. That is not entirely accurate, but there's an element of accuracy in some cases to it.Eric: Yeah.Corey: But I don't get the sense that's how you see it. I'm left with a strong impression that it's a very intentional choice for some of your customers, that in some cases, workloads that are live in data centers, were at one point living in a cloud provider. Talk to me about that.Eric: Yeah, I like to think of it not as, like, a binary situation. And something that exists on degrees, and often times has a lot to do with the lifecycle of a product or company and the scale of a company. And we touched on this earlier in our conversation, which is that if someone approached us—maybe a startup or a smaller company—trying to migrate off of half-dozen servers and move into the public cloud, and they approached us and said, “Yeah, we need to be hybrid for this thing,” I would probably question that and I would question it really hard, and say, “Really, what are you going after here? You're going to give up a lot if you choose to go hybrid, and you won't really take advantage of some of the amazing opportunities that a full-on single cloud solution has to offer.” On the flip side, we've seen companies that started like that, were a hundred percent in public cloud on a single provider, everything's working fine.There was no issues whatsoever, except the bill, or maybe a fear of what the bill could be. And this is something that happens at scale. There's just a point where the public cloud just doesn't make sense anymore, even despite those benefits. And for the bottom line and in terms of the margins and your cost of revenue, giving up some of those additional benefits that allow you to grow and scale is worth it. And if you look at the technology landscape, there's a reason Facebook's not in AWS. [laugh].There's a reason a lot of these larger technology companies where we have hundreds of millions of users or bazillions of petabytes of data, that move out and get out of there. I mean, look at Dropbox right? [laugh]. Is Dropbox storing all their data in the public cloud? Not really, and they are a public cloud in a sense on their own, right?Corey: Yeah, they did just launch a 34-petabyte data warehouse for analytics on AWS, and they've made a bunch of big—Eric: Yeah, yeah. I mean—Corey: —deals out of that, but the core storage workload, yeah, that does not economically make sense, given their access patterns and how they have built that offering. So yeah, that is a very well understood, very specific, very niche workload. Yeah, that does not belong in AWS.Eric: We've even gone as far as launching our own multi-petabyte managed object storage solution, it's totally S3 compatible. It works identically to the way S3 works, but we have customers that actually can do better on our platform, either because we can provide lower latency, we can provide custom contracts that aren't just purely pay as you go; there's all kinds of different options that we can give our customers that are more custom and tailored to their needs that you're going to get from, “Here's your API keys have fun.” [laugh]. And so there's still a market for that stuff and there's still a need for that stuff.Corey: There really is.Eric: So, the answer is it depends, Corey. [laugh].Corey: It seems to be the answer to any nuanced question. So, if people want to learn more about what you're doing at Deft, and potentially whether it might help them with some of the challenges they're facing, where can they find you?Eric: Easy. deft.com, D-E-F-T dot com. Great, short four-letter domain name that you wouldn't believe what we had to go through to get. [laugh].Corey: I can only imagine. Thank you so much for taking the time to speak with me today. I really appreciate it.Eric: You're welcome, Corey.Corey: Eric Dynowski, Chief Solutions Officer and Partner at Deft. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice, along with a comment telling me that no, customers should in fact go all-in on your third-rate cloud.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About Ben KehoeBen Kehoe is a Cloud Robotics Research Scientist at iRobot and an AWS Serverless Hero. As a serverless practitioner, Ben focuses on enabling rapid, secure-by-design development of business value by using managed services and ephemeral compute (like FaaS). Ben also seeks to amplify voices from dev, ops, and security to help the community shape the evolution of serverless and event-driven designs.Twitter: @ben11kehoeMedium: ben11kehoeGitHub: benkehoeLinkedIn: ben11kehoeiRobot: www.irobot.comWatch this episode on YouTube: https://youtu.be/B0QChfAGvB0 This episode is sponsored by CBT Nuggets and Lumigo.TranscriptJeremy: Hi, everyone. I'm Jeremy Daly.Rebecca: And I'm Rebecca Marshburn.Jeremy: And this is Serverless Chats. And this is a momentous occasion on Serverless Chats because we are welcoming in Rebecca Marshburn as an official co-host of Serverless Chats.Rebecca: I'm pretty excited to be here. Thanks so much, Jeremy.Jeremy: So for those of you that have been listening for hopefully a long time, and we've done over 100 episodes. And I don't know, Rebecca, do I look tired? I feel tired.Rebecca: I've never seen you look tired.Jeremy: Okay. Well, I feel tired because we've done a lot of these episodes and we've published a new episode every single week for the last 107 weeks, I think at this point. And so what we're going to do is with you coming on as a new co-host, we're going to take a break over the summer. We're going to revamp. We're going to do some work. We're going to put together some great content. And then we're going to come back on, I think it's August 30th with a new episode and a whole new show. Again, it's going to be about serverless, but what we're thinking is ... And, Rebecca, I would love to hear your thoughts on this as I come at things from a very technical angle, because I'm an overly technical person, but there's so much more to serverless. There's so many other sides to it that I think that bringing in more perspectives and really being able to interview these guests and have a different perspective I think is going to be really helpful. I don't know what your thoughts are on that.Rebecca: Yeah. I love the tech side of things. I am not as deep in the technicalities of tech and I come at it I think from a way of loving the stories behind how people got there and perhaps who they worked with to get there, the ideas of collaboration and community because nothing happens in a vacuum and there's so much stuff happening and sharing knowledge and education and uplifting each other. And so I'm super excited to be here and super excited that one of the first episodes I get to work on with you is with Ben Kehoe because he's all about both the technicalities of tech, and also it's actually on his Twitter, a new compassionate tech values around humility, and inclusion, and cooperation, and learning, and being a mentor. So couldn't have a better guest to join you in the Serverless Chats community and being here for this.Jeremy: I totally agree. And I am looking forward to this. I'm excited. I do want the listeners to know we are testing in production, right? So we haven't run any unit tests, no integration tests. I mean, this is straight test in production.Rebecca: That's the best practice, right? Total best practice to test in production.Jeremy: Best practice. Right. Exactly.Rebecca: Straight to production, always test in production.Jeremy: Push code to the cloud. Here we go.Rebecca: Right away.Jeremy: Right. So if it's a little bit choppy, we'd love your feedback though. The listeners can be our observability tool and give us some feedback and we can ... And hopefully continue to make the show better. So speaking of Ben Kehoe, for those of you who don't know Ben Kehoe, I'm going to let him introduce himself, but I have always been a big fan of his. He was very, very early in the serverless space. I read all his blogs very early on. He was an early AWS Serverless Hero. So joining us today is Ben Kehoe. He is a cloud robotics research scientist at iRobot, as I said, an AWS Serverless Hero. Ben, welcome to the show.Ben: Thanks for having me. And I'm excited to be a guinea pig for this new exciting format.Rebecca: So many observability tools watching you be a guinea pig too. There's lots of layers to this.Jeremy: Amazing. All right. So Ben, why don't you tell the listeners for those that don't know you a little bit about yourself and what you do with serverless?Ben: Yeah. So I mean, as with all software, software is people, right? It's like Soylent Green. And so I'm really excited for this format being about the greater things that technology really involves in how we create it and set it up. And serverless is about removing the things that don't matter so that you can focus on the things that do matter.Jeremy: Right.Ben: So I've been interested in that since I learned about it. And at the time saw that I could build things without running servers, without needing to deal with the scaling of stuff. I've been working on that at iRobot for over five years now. As you said early on in serverless at the first serverless con organized by A Cloud Guru, now plural sites.Jeremy: Right.Ben: And yeah. And it's been really exciting to see it grow into the large-scale community that it is today and all of the ways in which community are built like this podcast.Jeremy: Right. Yeah. I love everything that you've done. I love the analogies you've used. I mean, you've always gone down this road of how do you explain serverless in a way to show really the adoption of it and how people can take that on. Serverless is a ladder. Some of these other things that you would ... I guess the analogies you use were always great and always helped me. And of course, I don't think we've ever really come to a good definition of serverless, but we're not talking about that today. But ...Ben: There isn't one.Jeremy: There isn't one, which is also a really good point. So yeah. So welcome to the show. And again, like I said, testing in production here. So, Rebecca, jump in when you have questions and we'll beat up Ben from both sides on this, but, really ...Rebecca: We're going to have Ben from both sides.Jeremy: There you go. We'll embrace him from both sides. There you go.Rebecca: Yeah. Yeah.Jeremy: So one of the things though that, Ben, you have also been very outspoken on which I absolutely love, because I'm in very much closely aligned on this topic here. But is about infrastructure as code. And so let's start just quickly. I mean, I think a lot of people know or I think people working in the cloud know what infrastructure as code is, but I also think there's a lot of people who don't. So let's just take a quick second, explain what infrastructure as code is and what we mean by that.Ben: Sure. To my mind, infrastructure as code is about having a definition of the state of your infrastructure that you want to see in the cloud. So rather than using operations directly to modify that state, you have a unified definition of some kind. I actually think infrastructure is now the wrong word with serverless. It used to be with servers, you could manage your fleet of servers separate from the software that you were deploying onto the servers. And so infrastructure being the structure below made sense. But now as your code is intimately entwined in the rest of your resources, I tend to think of resource graph definitions rather than infrastructure as code. It's a less convenient term, but I think it's worth understanding the distinction or the difference in perspective.Jeremy: Yeah. No, and I totally get that. I mean, I remember even early days of cloud when we were using the Chefs and the Puppets and things like that, that we were just deploying the actual infrastructure itself. And sometimes you deploy software as part of that, but it was supporting software. It was the stuff that ran in the runtime and some of those and some configurations, but yeah, but the application code that was a whole separate process, and now with serverless, it seems like you're deploying all those things at the same time.Ben: Yeah. There's no way to pick it apart.Jeremy: Right. Right.Rebecca: Ben, there's something that I've always really admired about you and that is how strongly you hold your opinions. You're fervent about them, but it's also because they're based on this thorough nature of investigation and debate and challenging different people and yourself to think about things in different ways. And I know that the rest of this episode is going to be full with a lot of opinions. And so before we even get there, I'm curious if you can share a little bit about how you end up arriving at these, right? And holding them so steady.Ben: It's a good question. Well, I hope that I'm not inflexible in these strong opinions that I hold. I mean, it's one of those strong opinions loosely held kind of things that new information can change how you think about things. But I do try and do as much thinking as possible so that there's less new information that I have to encounter to change an opinion.Rebecca: Yeah. Yeah.Ben: Yeah. I think I tend to try and think about how people ... But again, because it's always people. How people interact with the technology, how people behave, how organizations behave, and then how technology fits into that. Because sometimes we talk about technology in a vacuum and it's really not. Technology that works for one context doesn't work for another. I mean, a lot of my strong opinions are that there is no one right answer kind of a thing, or here's a framework for understanding how to think about this stuff. And then how that fits into a given person is just finding where they are in that more general space. Does that make sense? So it's less about finding out here's the one way to do things and more about finding what are the different options, how do you think about the different options that are out there.Rebecca: Yeah, totally makes sense. And I do want to compliment you. I do feel like you are very good at inviting new information in if people have it and then you're like, "Aha, I've already thought of that."Ben: I hope so. Yeah. I was going to say, there's always a balance between trying to think ahead so that when you discover something you're like, "Oh, that fits into what I thought." And the danger of that being that you're twisting the information to fit into your preexisting structures. I hope that I find a good balance there, but I don't have a principle way of determining that balance or knowing where you are in that it's good versus it's dangerous kind of spectrum.Jeremy: Right. So one of the opinions that you hold that I tend to agree with, I have some thoughts about some of the benefits, but I also really agree with the other piece of it. And this really has to do with the CDK and this idea of using CloudFormation or any sort of DSL, maybe Terraform, things like that, something that is more domain-specific, right? Or I guess declarative, right? As opposed to something that is imperative like the CDK. So just to get everybody on the same page here, what is the top reasons why you believe, or you think that DSL approach is better than that iterative approach or interpretive approach, I guess?Ben: Yeah. So I think we get caught up in the imperative versus declarative part of it. I do think that declarative has benefits that can be there, but the way that I think about it is with the CDK and infrastructure as code in general, I'm like mildly against imperative definitions of resources. And we can get into that part, but that's not my smallest objection to the CDK. I'm moderately against not being able to enforce deterministic builds. And the CDK program can do anything. Can use a random number generator and go out to the internet to go ask a question, right? It can do anything in that program and that means that you have no guarantees that what's coming out of it you're going to be able to repeat.So even if you check the source code in, you may not be able to go back to the same infrastructure that you had before. And you can if you're disciplined about it, but I like tools that help give you guardrails so that you don't have to be as disciplined. So that's my moderately against. My strongly against piece is I'm strongly against developer intent remaining client side. And this is not an inherent flaw in the CDK, is a choice that the CDK team has made to turn organizational dysfunction in AWS into ownership for their customers. And I don't think that's a good approach to take, but that's also fixable.So I think if we want to start with the imperative versus declarative thing, right? When I think about the developers expressing an intent, and I want that intent to flow entirely into the cloud so that developers can understand what's deployed in the cloud in terms of the things that they've written. The CDK takes this approach of flattening it down, flattening the richness of the program the developer has written into ... They think of it as assembly language. I think that is a misinterpretation of what's happening. The assembly language in the process is the imperative plan generated inside the CloudFormation engine that says, "Here's how I'm going to take this definition and turn it into an actual change in the cloud.Jeremy: Right.Ben: They're just translating between two definition formats in CDK scene. But it's a flattening process, it's a lossy process. So then when the developer goes to the Console or the API has to go say, "What's deployed here? What's going wrong? What do I need to fix?" None of it is framed in terms of the things that they wrote in their original language.Jeremy: Right.Ben: And I think that's the biggest problem, right? So drift detection is an important thing, right? What happened when someone went in through the Console? Went and tweaked some stuff to fix something, and now it's different from the definition that's in your source repository. And in CloudFormation, it can tell you that. But what I would want if I was running CDK is that it should produce another CDK program that represents the current state of the cloud with a meaningful file-level diff with my original program.Jeremy: Right. I'm just thinking this through, if I deploy something to CDK and I've got all these loops and they're generating functions and they're using some naming and all this kind of stuff, whatever, now it produces this output. And again, my naming of my functions might be some function that gets called to generate the names of the function. And so now I've got all of these functions named and I have to go in. There's no one-to-one map like you said, and I can imagine somebody who's not familiar with CloudFormation which is ultimately what CDK synthesizes and produces, if you're not familiar with what that output is and how that maps back to the constructs that you created, I can see that as being really difficult, especially for younger developers or developers who are just getting started in that.Ben: And the CDK really takes the attitude that it's going to hide those things from those developers rather than help them learn it. And so when they do have to dive into that, the CDK refers to it as an escape hatch.Jeremy: Yeah.Ben: And I think of escape hatches on submarines, where you go from being warm and dry and having air to breathe to being hundreds of feet below the sea, right? It's not the sort of thing you want to go through. Whereas some tools like Amplify talk about graduation. In Amplify they aim to help you understand the things that Amplify is doing for you, such that when you grow beyond what Amplify can provide you, you have the tools to do that, to take the thing that you built and then say, "Okay, I know enough now that I understand this and can add onto it in ways that Amplify can't help with."Jeremy: Right.Ben: Now, how successful they are in doing that is a separate question I think, but the attitude is there to say, "We're looking to help developers understand these things." Now the CDK could also if the CDK was a managed service, right? Would not need developers to understand those things. If you could take your program directly to the cloud and say, "Here's my program, go make this real." And when it made it real, you could interact with the cloud in an understanding where you could list your deployed constructs, right? That you can understand the program that you wrote when you're looking at the resources that are deployed all together in the cloud everywhere. That would be a thing where you don't need to learn CloudFormation.Jeremy: Right.Ben: Right? That's where you then end up in the imperative versus declarative part where, okay, there's some reasons that I think declarative is better. But the major thing is that disconnect that's currently built into the way that CDK works. And the reason that they're doing that is because CloudFormation is not moving fast enough, which is not always on the CloudFormation team. It's often on the service teams that aren't building the resources fast enough. And that's AWS's problem, AWS as an entire company, as an organization. And this one team is saying, "Well, we can fix that by doing all this client side."What that means is that the customers are then responsible for all the things that are happening on the client side. The reason that they can go fast is because the CDK team doesn't have ownership of it, which just means the ownership is being pushed on customers, right? The CDK deploys Lambda functions into your account that they don't tell you about that you're now responsible for. Right? Both the security and operations of. If there are security updates that the CDK team has to push out, you have to take action to update those things, right? That's ownership that's being pushed onto the customer to fix a lack of ACM certificate management, right?Jeremy: Right. Right.Ben: That is ACM not building the thing that's needed. And so AWS says, "Okay, great. We'll just make that the customer's problem."Jeremy: Right.Ben: And I don't agree with that approach.Rebecca: So I'm sure as an AWS Hero you certainly have pretty good, strong, open communication channels with a lot of different team members across teams. And I certainly know that they're listening to you and are at least hearing you, I should say, and watching you and they know how you feel about this. And so I'm curious how some of those conversations have gone. And some teams as compared to others at AWS are really, really good about opening their roadmap or at least saying, "Hey, we hear this, and here's our path to a solution or a success." And I'm curious if there's any light you can shed on whether or not those conversations have been fruitful in terms of actually being able to get somewhere in terms of customer and AWS terms, right? Customer obsession first.Ben: Yeah. Well, customer obsession can mean two things, right? Customer obsession can mean giving the customer what they want or it can mean giving the customer what they need and different AWS teams' approach fall differently on that scale. The reason that many of those things are not available in CloudFormation is that those teams are ... It could be under-resourced. They could have a larger majority of customer that want new features rather than infrastructure as code support. Because as much as we all like infrastructure as code, there are many, many organizations out there that are not there yet. And with the CDK in particular, I'm a relatively lone voice out there saying, "I don't think this ownership that's being pushed onto the customer is a good thing." And there are lots of developers who are eating up CDK saying, "I don't care."That's not something that's in their worry. And because the CDK has been enormously successful, right? It's fixing these problems that exists. And I don't begrudge them trying to fix those problems. I think it's a question of do those developers who are grabbing onto those things and taking them understand the full total cost of ownership that the CDK is bringing with it. And if they don't understand it, I think AWS has a responsibility to understand it and work with it to help those customers either understand it and deal with it, right? Which is where the CDK takes this approach, "Well, if you do get Ops, it's all fine." And that's somewhat true, but also many developers who can use the CDK do not control their CI/CD process. So there's all sorts of ways in which ... Yeah, so I think every team is trying to do the best that they can, right?They're all working hard and they all have ... Are pulled in many different directions by customers. And most of them are making, I think, the right choices given their incentives, right? Given what their customers are asking for. I think not all of them balance where customers ... meeting customers where they are versus leading them where they should, like where they need to go as well as I would like. But I think ... I had a conclusion to that. Oh, but I think that's always a debate as to where that balance is. And then the other thing when I talk about the CDK, that my ideal audience there is less AWS itself and more AWS customers ...Rebecca: Sure.Ben: ... to understand what they're getting into and therefore to demand better of AWS. Which is in general, I think, the approach that I take with AWS, is complaining about AWS in public, because I do have the ability to go to teams and say, "Hey, I want this thing," right? There are plenty of teams where I could just email them and say, "Hey, this feature could be nice", but I put it on Twitter because other people can see that and say, "Oh, that's something that I want or I don't think that's helpful," right? "I don't care about that," or, "I think it's the wrong thing to ask for," right? All of those things are better when it's not just me saying I think this is a good thing for AWS, but it being a conversation among the community differently.Rebecca: Yeah. I think in the spirit too of trying to publicize types of what might be best next for customers, you said total cost of ownership. Even though it might seem silly to ask this, I think oftentimes we say the words total cost of ownership, but there's actually many dimensions to total cost of ownership or TCO, right? And so I think it would be great if you could enumerate what you think of as total cost of ownership, because there might be dimensions along that matrices, matrix, that people haven't considered when they're actually thinking about total cost of ownership. They're like, "Yeah, yeah, I got it. Some Ops and some security stuff I have to do and some patches," but they might only be thinking of five dimensions when you're like, "Actually the framework is probably 10 to 12 to 14." And so if you could outline that a bit, what you mean when you think of a holistic total cost of ownership, I think that could be super helpful.Ben: I'm bad at enumeration. So I would miss out on dimensions that are obvious if I was attempting to do that. But I think a way that I can, I think effectively answer that question is to talk about some of the ways in which we misunderstand TCO. So I think it's important when working in an organization to think about the organization as a whole, not just your perspective and that your team's perspective in it. And so when you're working for the lowest TCO it's not what's the lowest cost of ownership for my team if that's pushing a larger burden onto another team. Now if it's reducing the burden on your team and only increasing the burden on another team a little bit, that can be a lower total cost of ownership overall. But it's also something that then feeds into things like political capital, right?Is that increased ownership that you're handing to that team something that they're going to be happy with, something that's not going to cause other problems down the line, right? Those are the sorts of things that fit into that calculus because it's not just about what ... Moving away from that topic for a second. I think about when we talk about how does this increase our velocity, right? There's the piece of, "Okay, well, if I can deploy to production faster, right? My feedback loop is faster and I can move faster." Right? But the other part of that equation is how many different threads can you be operating on and how long are those threads in time? So when you're trying to ship a feature, if you can ship it and then never look at it again, that means you have increased bandwidth in the future to take on other features to develop other new features.And so even if you think about, "It's going to take me longer to finish this particular feature," but then there's no maintenance for that feature, that can be a lower cost of ownership in time than, "I can ship it 50% faster, but then I'm going to periodically have to revisit it and that's going to disrupt my ability to ship other things," right? So this is where I had conversations recently about increasing use of Step Functions, right? And being able to replace Lambda functions with Step Functions express workflows because you never have to go back to those Lambdas and update dependencies in them because dependent bot has told you that you need to or a version of Python is getting deprecated, right? All of those things, just if you have your Amazon States Language however it's been defined, right?Once it's in there, you never have to touch it again if nothing else changes and that means, okay, great, that piece is now out of your work stream forever unless it needs to change. And that means that you have more bandwidth for future things, which serverless is about in general, right? Of say, "Okay, I don't have to deal with this scaling problems here. So those scaling things. Once I have an auto-scaling group, I don't have to go back and tweak it later." And so the same thing happens at the feature level if you build it in ways that allow you to do that. And so I think that's one of the places where when we focus on, okay, how fast is this getting me into production, it's okay, but how often do you have to revisit it ...Jeremy: Right. And so ... So you mentioned a couple of things in there, and not only in that question, but in the previous questions as you were talking about the CDK in general, and I am 100% behind you on this idea of deterministic builds because I want to know exactly what's being deployed. I want to be able to audit that and map that back. And you can audit, I mean, you could run CDK synth and then audit the CloudFormation and test against certain things. But if you are changing stuff, right? Then you have to understand not only the CDK but also the CloudFormation that it actually generates. But in terms of solving problems, some of the things that the CDK does really, really well, and this is something where I've always had this issue with just trying to use raw CloudFormation or Serverless Framework or SAM or any of these things is the fact that there's a lot of boilerplate that you often have to do.There's ways that companies want to do something specifically. I basically probably always need 1,400 lines of CloudFormation. And for every project I do, it's probably close to the same, and then add a little bit more to actually make it adaptive for my product. And so one thing that I love about the CDK is constructs. And I love this idea of being able to package these best practices for your company or these compliance requirements, excuse me, compliance requirements for your company, whatever it is, be able to package these and just hand them to developers. And so I'm just curious on your thoughts on that because that seems like a really good move in the right direction, but without the deterministic builds, without some of these other problems that you talked about, is there another solution to that that would be more declarative?Ben: Yeah. In theory, if the CDK was able to produce an artifact that represented all of the non-deterministic dependencies that it had, right? That allowed you to then store that artifacts as you'd come back and put that into the program and say, "I'm going to get out the same thing," but because the CDK doesn't control upstream of it, the code that the developers are writing, there isn't a way to do that. Right? So on the abstraction front, the constructs are super useful, right? CloudFormation now has modules which allow you to say, "Here's a template and I'm going to represent this as a CloudFormation type itself," right? So instead of saying that I need X different things, I'm going to say, "I packaged that all up here. It is as a type."Now, currently, modules can only be playing CloudFormation templates and there's a lot of constraints in what you can express inside a CloudFormation template. And I think the answer for me is ... What I want to see is more richness in the CloudFormation language, right? One of the things that people do in the CDK that's really helpful is say, "I need a copy of this in every AZ."Jeremy: Right.Ben: Right? There's so much boilerplate in server-based things. And CloudFormation can't do that, right? But if you imagine that it had a map function that allowed you to say, "For every AZ, stamp me out a copy of this little bit." And then that the CDK constructs allowed to translate. Instead of it doing all this generation only down to the L one piece, instead being able to say, "I'm going to translate this into more rich CloudFormation templates so that the CloudFormation template was as advanced as possible."Right? Then it could do things like say, "Oh, I know we need to do this in every AZ, I'm going to use this map function in the CloudFormation template rather than just stamping it out." Right? And so I think that's possible. Now, modules should also be able to be defined as CDK programs. Right? You should be able to register a construct as a CloudFormation tag.Jeremy: It would be pretty cool.Ben: There's no reason you shouldn't be able to. Yeah. Because I think the declarative versus imperative thing is, again, not the most important piece, it's how do we move ... It's shifting right in this case, right? That how do you shift what's happening with the developer further into the process of deployment so that more of their context is present? And so one of the things that the CDK does that's hard to replicate is have non-local effects. And this is both convenient and I think of code smell often.So you can pass a bucket resource from another stack into a piece of code in your CDK program that's creating a different stack and you say, "Oh great, I've got this Lambda function, it needs permissions to that bucket. So add permissions." And it's possible for the CDK programs to either be adding the permissions onto the IAM role of that function, or non-locally adding to that bucket's resource policy, which is weird, right? That you can be creating a stack and the thing that you do to that stack or resource or whatever is not happening there, it's happening elsewhere. I don't think that's a great approach, but it's certainly convenient to be able to do it in a lot of situations.Now, that's not representable within a module. A module is a contained piece of functionality that can't touch anything else. So things like SAM where you can add events onto a function that can go and create ... You create the API events on different functions and then SAM aggregates them and creates an API gateway for you. Right? If AWS serverless function was a module, it couldn't do that because you'd have these in different places and you couldn't aggregate something between all of them and put them in the top-level thing, right?This is what CloudFormation macros enable, but they don't have a... There's no proper interface to them, right? They don't define, "This is what I'm doing. This is the kind of resources I can create." There's none of that that would help you understand them. So they're infinitely flexible, but then also maybe less principled for that reason. So I think there are ways to evolve, but it's investment in the CloudFormation language that allows us to shift that burden from being a flattening inside client-side code from the developer and shifting it to be able to be represented in the cloud.Jeremy: Right. Yeah. And I think from that standpoint too if we go back to the solving people's problems standpoint, that everything you explained there, they're loaded with nuances, it's loaded with gotchas, right? Like, "Oh, you can't do this, you can't do that." So that's just why I think the CDK is so popular because it's like you can do so much with it so quickly and it's very, very fast. And I think that trade-off, people are just willing to make it.Ben: Yes. And that's where they're willing to make it, do they fully understand the consequences of it? Then does AWS communicate those consequences well? Before I get into that question of, okay, you're a developer that's brand new to AWS and you've been tasked with standing up some Kubernetes cluster and you're like, "Great. I can use a CDK to do this." Something is malfunctioning. You're also tasked with the operations and something is malfunctioning. You go in through the Console and maybe figure out all the things that are out there are new to you because they're hidden inside L3 constructs, right?You're two levels down from where you were defining what you want, and then you find out what's wrong and you have no idea how to turn that into a change in your CDK program. So instead of going back and doing the thing that infrastructure as code is for, which is tweaking your program to go fix the problem, you go and you tweak it in the Console ...Jeremy: Right. Which you should never do.Ben: ... and you fix it that way. Right. Well, and that's the thing that I struggle with, with the CDK is how does the CDK help the developer who's in that situation? And I don't think they have a good story around that. Now, I don't know. I haven't talked with enough junior developers who are using the CDK about how often they get into that situation. Right? But I always say client-side code is not a replacement for a managed service because when it's client-side code, you still own the result.Jeremy: Right.Ben: If a particular CDK construct was a managed service in AWS, then all of the resources that would be created underneath AWS's problem to make work. And the interface that the developer has is the only level of ownership that they have. Fargate is this. Because you could do all the things that Fargate does with a CDK construct, right? Set up EC2, do all the things, and represent it as something that looks like Fargate in your CDK program. But every time your EC2 fleet is unhealthy that's your problem. With Fargate, that's AWS's problem. If we didn't have Fargate, that's essentially what CDK would be trying to do for ECS.And I think we all recognize that Fargate is very necessary and helpful in that case, right? And I just want that for all the things, right? Whenever I have an abstraction, if it's an abstraction that I understand, then I should have a way of zooming into it while not having to switch languages, right? So that's where you shouldn't dump me out the CloudFormation to understand what you're doing. You should help me understand the low-level things in the same language. And if it's not something that I need to understand, it should be a managed service. It shouldn't be a bunch of stuff that I still own that I haven't looked at.Jeremy: Makes sense. Got a question, Rebecca? Because I was waiting for you to jump in.Rebecca: No, but I was going to make a joke, but then the joke passed, and then I was like, "But should I still make it?" I was going to be like, "Yeah, but does the CDK let you test in production?" But that was a 32nd ago joke and then I was really wrestling with whether or not I should tell it, but I told it anyway, hopefully, someone gets a laugh.Ben: Yeah. I mean, there's the thing that Charity Majors says, right? Which is that everybody tests in production. Some people are lucky enough to have a development environment in production. No, sorry. I said that the wrong way. It's everybody has a test environment. Some people are lucky enough that it's not in production.Rebecca: Yeah. Swap that. Reverse it. Yeah.Ben: Yeah.Jeremy: All right. So speaking of talking to developers and getting feedback from them, so I actually put a question out on Twitter a couple of weeks ago and got a lot of really interesting reactions. And essentially I asked, "What do you love or hate about infrastructure as code?" And there were a lot of really interesting things here. I don't know, maybe it might be fun to go through a couple of these and get your thoughts on them. So this is probably not a great one to start with, but I thought it was interesting because this I think represents the frustration that a lot of us feel. And it was basically that they love that automation minimizes future work, right? But they hate that it makes life harder over time. And that pretty much every approach to infrastructure in, sorry, yeah, infrastructure in code at the present is flawed, right? So really there are no good solutions right now.Ben: Yeah. CloudFormation is still a pain to learn and deal with. If you're operating in certain IDEs, you can get tab completion.Jeremy: Right.Ben: If you go to CDK you get tab completion, which is, I think probably most of the value that developers want out of it and then the abstraction, and then all the other fancy things it does like pipelines, which again, should be a managed service. I do think that person is absolutely right to complain about how difficult it is. That there are many ways that it could be better. One of the things that I think about when I'm using tools is it's not inherently bad for a tool to have some friction to use it.Jeremy: Right.Ben: And this goes to another infrastructure as code tool that goes even further than the CDK and says, "You can define your Lambda code in line with your infrastructure definition." So this is fine with me. And there's some other ... I think Punchcard also lets you do some of this. Basically extracts out the bits of your code that you say, "This is a custom thing that glues together two things I'm defining in here and I'll make that a Lambda function for you." And for me, that is too little friction to defining a Lambda function.Because when I define a Lambda function, just going back to that bringing in ownership, every time I add a Lambda function, that's something that I own, that's something that I have to maintain, that I'm responsible for, that can go wrong. So if I'm thinking about, "Well, I could have API Gateway direct into DynamoDB, but it'd be nice if I could change some of these fields. And so I'm just going to drop in a little sprinkle of code, three lines of code in between here to do some transformation that I want." That is all of sudden an entire Lambda function you've brought into your infrastructure.Jeremy: Right. That's a good point.Ben: And so I want a little bit of friction to do that, to make me think about it, to make me say, "Oh, yeah, downstream of this decision that I am making, there are consequences that I would not otherwise think about if I'm just trying to accomplish the problem," right? Because I think developers, humans, in general, tend to be a bit shortsighted when you have a goal especially, and you're being pressured to complete that goal and you're like, "Okay, well I can complete it." The consequences for later are always a secondary concern.And so you can change your incentives in that moment to say, "Okay, well, this is going to guide me to say, "Ah, I don't really need this Lambda function in here. Then I'm better off in the long term while accomplishing that goal in the short term." So I do think that there is a place for tools making things difficult. That's not to say that the amount of difficult that infrastructure as code is today is at all reasonable, but I do think it's worth thinking about, right?I'd rather take on the pain of creating an ASL definition by hand for express workflow than the easier thing of writing Lambda code. Because I know the long-term consequences of that. Now, if that could be flipped where it was harder to write something that took more ownership, it'd be just easy to do, right? You'd always do the right thing. But I think it's always worth saying, "Can I do the harder thing now to pay off to pay off later?"Jeremy: And I always call those shortcuts "tomorrow-Jeremy's" problem. That's how I like to look at those.Ben: Yeah. Yes.Jeremy: And the funny thing about that too is I remember right when EventBridge came out and there was no CloudFormation support for a long time, which was super frustrating. But Serverless Framework, for example, implemented a custom resource in order to do that. And I remember looking at a clean stack and being like, "Why are there two Lambda functions there that I have no idea?" I'm like, "I didn't publish ..." I honestly thought my account was compromised that somebody had published a Lambda function in there because I'm like, "I didn't do that." And then it took me a while to realize, I'm like, "Oh, this is what this is." But if it is that easy to just create little transform functions here and there, I can imagine there being thousands of those in your account without anybody knowing that they even exist.Ben: Now, don't get me wrong. I would love to have the ability to drop in little transforms that did not involve Lambda functions. So in other words, I mean, the thing that VTL does for API Gateway, REST APIs but without it being VTL and being ... Because that's hard and then also restricted in what you can do, right? It's not, "Oh, I can drop in arbitrary code in here." But enough to say, "Oh, I want to flip ... These fields should go from a key-value mapping to a list of key-value, right? In the way that it addresses inconsistent with how tags are defined across services, those kinds of things. Right? And you could drop that in any service, but once you've defined it, there's no maintenance for you, right?You're writing JavaScript. It's not actually a JavaScript engine underneath or something. It's just getting translated into some big multi-tenant fancy thing. And I have a hypothesis that that should be possible. You should be able to do it where you could even do it in the parsing of JSON, being able to do transforms without ever having to have the whole object in memory. And if we could get that then, "Oh, sure. Now I have sprinkled all over the place all of these little transforms." Now there's a little bit of overhead if the transform is defined correctly or not, right? But once it is, then it just works. And having all those little transforms everywhere is then fine, right? And that incentive to make it harder it doesn't need to be there because it's not bringing ownership with it.Rebecca: Yeah. It's almost like taking the idea of tomorrow-Jeremy's problem and actually switching it to say tomorrow-Jeremy's celebration where tomorrow-Jeremy gets to look back at past-Jeremy and be like, "Nice. Thank you for making that decision past-Jeremy." Because I think we often do look at it in terms of tomorrow-Jeremy will think of this, we'll solve this problem rather than how do we approach it by saying, how do I make tomorrow-Jeremy thankful for it today-Jeremy? And that's a simple language, linguistic switch, but a hard switch to actually make decisions based on.Ben: Yeah. I don't think tomorrow-Ben is ever thankful for today-Ben. I think it's tomorrow-Ben is thankful for yesterday-Ben setting up the incentives correctly so that today-Ben will do the right thing for tomorrow-Ben. Right? When I think about people, I think it's easier to convince people to accept a change in their incentives than to convince them to fight against their incentives sustainably.Jeremy: Right. And I think developers and I'm guilty of this too, I mean, we make decisions based off of expediency. We want to get things done fast. And when you get stuck on that problem you're like, "You know what? I'm not going to figure it out. I'm just going to write a loop or I'm going to do whatever I can do just to make it work." Another if statement here, "Isn't going to hurt anybody." All right. So let's move to ... Sorry, go ahead.Ben: We shouldn't feel bad about that.Jeremy: You're right.Ben: I was going to say, we shouldn't feel bad about that. That's where I don't want tomorrow-Ben to have to be thankful for today-Ben, because that's the implication there is that today-Ben is fighting against his incentives to do good things for tomorrow-Ben. And if I don't need to have to get to that point where just the right path is the easiest path, right? Which means putting friction in the right places than today-Ben ... It's never a question of whether today-Ben is doing something that's worth being thankful for. It's just doing the job, right?Jeremy: Right. No, that makes sense. All right. I got another question here, I think falls under the category of service discovery, which I know is another topic that you love. So this person said, "I love IaC, but hate the fuzzy boundaries where certain software awkwardly fall. So like Istio and Prometheus and cert-manager. That they can be considered part of the infrastructure, but then it's awkward to deploy them when something like Terraform due to circular dependencies relating to K8s and things like that."So, I mean, I know that we don't have to get into the actual details of that, but I think that is an important aspect of infrastructure as code where best practices sometimes are deploy a stack that has your permanent resources and then deploy a stack that maybe has your more femoral or the ones that are going to be changing, the more mutable ones, maybe your Lambda functions and some of those sort of things. If you're using Terraform or you're using some of these other services as well, you do have that really awkward mix where you're trying to use outputs from one stack into another stack and trying to do all that. And really, I mean, there are some good tools that help with it, but I mean just overall thoughts on that.Ben: Well, we certainly need to demand better of AWS services when they design new things that they need to be designed so that infrastructure as code will work. So this is the S3 bucket notification problem. A very long time ago, S3 decided that they were going to put bucket notifications as part of the S3 bucket. Well, CloudFormation at that point decided that they were going to put bucket notifications as part of the bucket resource. And S3 decided that they were going to check permissions when the notification configuration is defined so that you have to have the permissions before you create the configuration.This creates a circular dependency when you're hooking it up to anything in CloudFormation because the dependency depends on the resource policy on an SNS topic, and SQS queue or a Lambda function depends on the bucket name if you're letting CloudFormation name the bucket, which is the best practice. Then bucket name has to exist, which means the resource has to have been created. But the notification depends on the thing that's notifying, which doesn't have the names and the resource policy doesn't exist so it all fails. And this is solved in a couple of different ways. One of which is name your bucket explicitly, again, not a good practice. Another is what SAM does, which says, "The Lambda function will say I will allow all S3 buckets to invoke me."So it has a star permission in it's resource policy. So then the notification will work. None of which is good or there's custom resources that get created, right? Now, if those resources have been designed with infrastructure as code as part of the process, then it would have been obvious, "Oh, you end up with a circular pendency. We need to split out bucket notifications as a separate resource." And not enough teams are doing this. Often they're constrained by the API that they develop first ...Jeremy: That's a good point.Ben: ... they come up with the API, which often makes sense for a Console experience that they desire. So this is where API Gateway has this whole thing where you create all the routes and the resources and the methods and everything, right? And then you say, "Great, deploy." And in the Console you only need one mutable working copy of that at a time, but it means that you can't create two deployments or update two stages in parallel through infrastructure as code and API Gateway because they both talk to this mutable working copy state and would overwrite each other.And if infrastructure as code had been on their list would have been, "Oh, if you have a definition of your API, you should be able to go straight to the deployment," right? And so trying to push that upstream, which to me is more important than infrastructure as code support at launch, but people are often like, "Oh, I want CloudFormation support at launch." But that often means that they get no feedback from customers on the design and therefore make it bad. KMS asymmetric keys should have been a different resource type so that you can easily tell which key types are in your template.Jeremy: Good point. Yeah.Ben: Right? So that you can use things like CloudFormation Guard more easily on those. Sure, you can control the properties or whatever, but you should be able to think in terms of, "I have a symmetric key or an asymmetric key in here." And they're treated completely separately because you use them completely differently, right? They don't get used to the same place.Jeremy: Yeah. And it's funny that you mentioned the lacking support at launch because that was another complaint. That was quite prevalent in this thread here, was people complaining that they don't get that CloudFormation support right away. But I think you made a very good point where they do build the APIs first. And that's another thing. I don't know which question asked me or which one of these mentioned it, but there was a lot of anger over the fact that you go to the API docs or you go to the docs for AWS and it focuses on the Console and it focuses on the CLI and then it gives you the API stuff and very little mention of CloudFormation at all. And usually, you have to go to a whole separate set of docs to find the CloudFormation. And it really doesn't tie all the concepts together, right? So you get just a block of JSON or of YAML and you're like, "Am I supposed to know what everything does here?"Ben: Yeah. I assume that's data-driven. Right? And we exist in this bubble where everybody loves infrastructure as code.Jeremy: True.Ben: And that AWS has many more customers who set things up using Console, people who learn by doing it first through the Console. I assume that's true, if it's not, then the AWS has somehow gotten on the extremely wrong track. But I imagine that's how they find that they get the right engagement. Now maybe the CDK will change some of this, right? Maybe the amount of interest that is generating, we'll get it to the point where blogs get written with CDK programs being written there. I think that presents different problems about what that CDK program might hide from when you're learning about a service. But yeah, it's definitely not ... I wrote a blog for AWS and my first draft had it as CloudFormation and then we changed it to the Console. Right? And ...Jeremy: That must have hurt. Did you die a little inside when that happened?Ben: I mean, no, because they're definitely our users, right? That's the way in which they interact with data, with us and they should be able to learn from that, their company, right? Because again, developers are often not fully in control of this process.Jeremy: Right. That's a good point.Ben: And so they may not be able to say, "I want to update this through CloudFormation," right? Either because their organization says it or just because their team doesn't work that way. And I think AWS gets requests to prevent people from using the Console, but also to force people to use the Console. I know that at least one of them is possible in IAM. I don't remember which, because I've never encountered it, but I think it's possible to make people use the Console. I'm not sure, but I know that there are companies who want both, right? There are companies who say, "We don't want to let people use the API. We want to force them to use the Console." There are companies who say, "We don't want people using the Console at all. We want to force them to use the APIs."Jeremy: Interesting.Ben: Yeah. There's a lot of AWS customers, right? And there's every possible variety of organization and AWS should be serving all of them, right? They're all customers. And certainly, I want AWS to be leading the ones that are earlier in their cloud journey and on the serverless ladder to getting further but you can't leave them behind, I think it's important.Jeremy: So that people argument and those different levels and coming in at a different, I guess, level or comfortability with APIs versus infrastructure as code and so forth. There was another question or another comment on this that said, "I love the idea of committing everything that makes my solution to text and resurrect an entire solution out of nothing other than an account key. Loved the ability to compare versions and unit tests, every bit of my solution, and not having to remember that one weird setting if you're using the Console. But hate that it makes some people believe that any coder is now an infrastructure wizard."And I think this is a good point, right? And I don't 100% agree with it, but I think it's a good point that it basically ... Back to your point about creating these little transformations in Pulumi, you could do a lot of damage, I mean, good or bad, right? When you are using these tools. What are your thoughts on that? I mean, is this something where ... And again, the CDK makes it so easy for people to write these constructs pretty quickly and spin up tons of infrastructure without a lot of guard rails to protect them.Ben: So I think if we tweak the statement slightly, I think there's truth there, which isn't about the self-perception but about what they need to be. Right? That I think this is more about serverless than about infrastructure as code. Infrastructure as code is just saying that you can define it. Right? I think it's more about the resources that are in a particular definition that require that. My former colleague, Aaron Camera says, "Serverless means every developer is an architect" because you're not in that situation where the code you write goes onto something, you write the whole thing. Right?And so you do need to have those ... You do need to be an infrastructure wizard whether you're given the tools to do that and the education to do that, right? Not always, like if you're lucky. And the self-perception is again an even different thing, right? Especially if coders think that there's nothing to be learned ... If programmers, software developers, think that there's nothing to be learned from the folks who traditionally define the infrastructure, which is Ops, right? They think, "Those people have nothing to teach me because now I can do all the things that they did." Well, you can create the things that they created and it does not mean that you're as good at it ...Jeremy: Or responsible for monitoring it too. Right.Ben: ... and have the ... Right. The monitoring, the experience of saying these are the things that will come back to bite you that are obvious, right? This is how much ownership you're getting into. There's very much a long-standing problem there of devaluing Ops as a function and as a career. And for my money when I look at serverless, I think serverless is also making the software development easier because there's so much less software you need to write. You need to write less software that deals with the hard parts of these architectures, the scaling, the distributed computing problems.You still have this, your big computing problems, but you're considering them functionally rather than coding things that address them, right? And so I see a lot of operations folks who come into serverless learn or learn a new programming language or just upscale, right? They're writing Python scripts to control stuff and then they learn more about Python to be able to do software development in it. And then they bring all of that Ops experience and expertise into it and look at something and say, "Oh, I'd much rather have step functions here than something where I'm running code for it because I know how much my script break and those kinds of things when an API changes or ... I have to update it or whatever it is."And I think that's something that Tom McLaughlin talks about having come from an outside ground into serverless. And so I think there's definitely a challenge there in both directions, right? That Ops needs to learn more about software development to be more engaged in that process. Software development does need to learn much more about infrastructure and is also at this risk of approaching it from, "I know the syntax, but not the semantics, sort of thing." Right? We can create ...Jeremy: Just because I can doesn't mean I should.Ben: ... an infrastructure. Yeah.Rebecca: So Ben, as we're looping around this conversation and coming back to this idea that software is people and that really software should enable you to focus on the things that do matter. I'm wondering if you can perhaps think of, as pristine as possible, an example of when you saw this working, maybe it was while you've been at iRobot or a project that you worked on your own outside of that, but this moment where you saw software really working as it should, and that how it enabled you or your team to focus on the things that matter. If there's a concrete example that you can give when you see it working really well and what that looks like.Ben: Yeah. I mean, iRobot is a great example of this having been the company without need for software that scaled to consumer electronics volumes, right? Roomba volumes. And needing to build a IOT cloud application to run connected Roombas and being able to do that without having to gain that expertise. So without having to build a team that could deal with auto-scaling fleets of servers, all of those things was able to build up completely serverlessly. And so skip an entire level of organizational expertise, because that's just not necessary to accomplish those tasks anymore.Rebecca: It sounds quite nice.Ben: It's really great.Jeremy: Well, I have one more question here that I think could probably end up ... We could talk about for another hour. So I will only throw it out there and maybe you can give me a quick answer on this, but I actually had another Twitter thread on this not too long ago that addressed this very, very problem. And this is the idea of the feedback cycle on these infrastructure as code tools where oftentimes to deploy infrastructure changes, I mean, it just takes time. In many cases things can run in parallel, but as you said, there's race conditions and things like that, that sometimes things have to be ... They just have to be synchronous. So is this something where there are ways where you see in the future these mutations to your infrastructure or things like that potentially happening faster to get a better feedback cycle, or do you think that's just something that we're going to have to deal with for a while?Ben: Yeah, I think it's definitely a very extensive topic. I think there's a few things. One is that the deployment cycle needs to get shortened. And part of that I think is splitting dev deployments from prod deployments. In prod it's okay for it to take 30 seconds, right? Or a minute or however long because that's at the end of a CI/CD pipeline, right? There's other things that are happening as part of that. Now, you don't want that to be hours or whatever it is. Right? But it's okay for that to be proper and to fully manage exactly what's going on in a principled manner.When you're doing for development, it would be okay to, for example, change the Lambda code without going through CloudFormation to change the Lambda code, right? And this is what an architect does, is there's a notion of a dirty deploy which just packages up. Now, if your resource graph has changed, you do need to deploy again. Right? But if the only thing that's changing is your code, sure, you can go and say, "Update function code," on that Lambda directly and that's faster.But calling it a dirty deploy is I think important because that is not something that you want to do in prod, right? You don't want there to be drift between what the infrastructure as code service understands, but then you go further than that and imagine there's no reason that you actually have to do this whole zip file process. You could be R sinking the code directly, or you could be operating over SSH on the code remotely, right? There's many different ways in which the loop from I have a change in my Lambda code to that Lambda having that change could be even shorter than that, right?And for me, that's what it's really about. I don't think that local mocking is the answer. You and Brian Rue were talking about this recently. I mean, I agree with both of you. So I think about it as I want unit tests of my business logic, but my business logic doesn't deal with AWS services. So I want to unit test something that says, "Okay, I'm performing this change in something and that's entirely within my custom code." Right? It's not touching other services. It doesn't mean that I actually need adapters, right? I could be dealing with the native formats that I'm getting back from a given service, but I'm not actually making calls out of the code. I'm mocking out, "Well, here's what the response would look like."And so I think that's definitely necessary in the unit testing sense of saying, "Is my business logic correct? I can do that locally. But then is the wiring all correct?" Is something that should only happen in the cloud. There's no reason to mock API gateway into Lambda locally in my mind. You should just be dealing with the Lambda side of it in your local unit tests rather than trying to set up this multiple thing. Another part of the story is, okay, so these deploys have to happen faster, right? And then how do we help set up those end-to-end test and give you observability into it? Right? X-Ray helps, but until X-Ray can sort through all the services that you might use in the serverless architecture, can deal with how does it work in my Lambda function when it's batching from Kinesis or SQS into my function?So multiple traces are now being handled by one invocation, right? These are problems that aren't solved yet. Until we get that kind of inspection, it's going to be hard for us to feel as good about cloud development. And again, this is where I feel sometimes there's more friction there, but there's bigger payoff. Is one of those things where again, fighting against your incentives which is not the place that you want to be.Jeremy: I'm going to stop you before you disagree with me anymore. No, just kidding! So, Rebecca, you have any final thoughts or questions for Ben?Rebecca: No. I just want to say to both of you and to everyone listening that I hope your today self is celebrating your yesterday-self right now.Jeremy: Perfect. Well, Ben, thank you so much for joining us and being a guinea pig as we said on this new format that we are trying. Excellent guinea pig. Excellent.Rebecca: An excellent human too but also great guinea pig.Jeremy: Right. Right. Pretty much so. So if people want to find out more about you, read some of the stuff you're doing and working on, how do they do that?Ben: I'm on Twitter. That's the primary place. I'm on LinkedIn, I don't post much there. And then I write articles that show up on Medium.Rebecca: And just so everyone knows your Twitter handle I'll say it out loud too. It's @ben11kehoe, K-E-H-O-E, ben11kehoe.Jeremy: Right. Perfect. All right. Well, we will put all that in the show notes and hopefully people will like this new format. And again, we'd love your feedback on this, things that you'd like us to do in the future, any ideas you have. And of course, make sure you reach out to Ben. He's an amazing resource for serverless. So again, thank you for everything you do, and thank you for being on the show.Ben: Yeah. Thanks so much for having me. This was great.Rebecca: Good to see you. Thank you.
שלום וברוכים הבאים לפודקאסט מספר 412 של רברס עם פלטפורמה. התאריך היום הוא ה-13 ביוני 2021, - הייתי אומר שזה תאריך קצת היסטורי: ככל הנראה היום הוקמה איזושהי ממשלה, אנחנו לא יודעים האם היא תחזיק מעמד, אבל ההצבעה הייתה ממש היום ואנחנו במתח לקראת מה שהולך להיות [מחשש שמספר הממשלות יעקוף את מספר הפרקים של רברסים?].היום אנחנו מקליטים פודקאסט עם ינון מחברת Via - תיכף תציג את עצמך - ויש לנו גם אורח מיוחד היום: כמחליף לאורי יש לנו היום את יונתן מחברת Outbrain - היי יונתן!שלום וברוך הבא - יונתן עובד ב-Outbrain כבר כך-וכך שנים וגם התארח בעבר בפרקים שלנו, אבל זה כבר ממש ממש היסטוריה עתיקה [נגיד 328 The tension between Agility and Ownership או Final Class 23: IDEs או 131 uijet או 088 Final Class 2 ... יש עוד].אז קודם כל - כיוון ש-412 זה Precondition Failed - כולם יודעים, נכון? לא הייתי צריך לבדוק את זה לפני השידור, ממש לא, זכרתי בע”פ . . . - אז יונתן, בוא וספר לנו על ה-Precondition שלך. או מי אתה, במילים אחרות . . . (יונתן) אז קודם כל - אני מאזין ותיק של רברסים, אני חושב שכשהגעתי ל-Outbrain לפני 10 שנים, אחת הסיבות הייתה הפודקאסט.הגעתי כמהנדס Backend, ובחמש השנים האחרונות אני מוביל ב-Outbrain את הפיתוח.(רן) “מוביל ב-Outbrain את הפיתוח” זו הדרך שלך להצטנע ולהגיד שאתה מנהל הפיתוח?(יונתן) מנהל הפיתוח . . .(רן) יפה, טוב - פיתוח ב-Outbrain זו קבוצה גדולה, יש לך הרבה עבודה, ותודה שבאת(יונתן) בשמחה.(רן) אז ינון מחברת Via, והיום אנחנו הולכים לדבר על הנושא של Serverless - אבל מזויות אחרות, זויות שעדיין לא כיסינו.לפני שנכנס ככה לנושא - ספר לנו קצת על עצמך.(ינון) אז אהלן, אני ינון, נעים מאוד.אני נמצא ב-Via משהו כמו שלוש שנים וקצת.סתם כרקע - דיברנו על Precondition - הגעתי ל-Via מחברת Ravello לפני כן - למי שלא מכיר, Ravello תומכת די גדולה ברברסים וככה גם התוודעתי גם לפודקאסט, וככה הגעתי אליך, רן . . .אספר קצת על Via, אני חושב שהם התארחו כבר בפודקסט [אכן - 360 Via] אבל עוד פעם, אולי מזוית אישית שלי, אני תמיד אוהב לספר לכולם צ'יזבט על Via . . .(רן) אז, דרך אגב, אירחנו איש מוצר, אז זה היה פחות טכנולוגי - והיום אנחנו הולכים להיות הרבה יותר טכנולוגיים.(ינון) מעולה - זו ההקדמה שאני עושה.למעשה, Via ככה התחילה . . . אני מספר תמיד צ'יזבט כזה, שאף אחד לא אישר או הכחיש במסדרונות של Via.לפי הסיפור, ה-CTO וה-Founder שלנו, אורן, הסתובב יום אחד ברחוב אלנבי וניסה לתפוס מונית שירות לכיוון תל אביב, לכיוון האוניברסיטה - ולמעשה גילה שזה די מסובך, בתקופה ההיא.לפני 7-8 שנים, משהו כזה, לתפוס מונית שירות זה לא כזה קל - צריך למצוא, ולאן להגיע, ואיך לעלות עליה ומה עושים איתה, ואיך שעולים צריך לשלם את הכסף הזה וזה נורא מסובך . . .ומה שעלה לו בראש זה ש”וואלה, זה רעיון מגניב הדבר הזה, זה סוג של בלגן מזרח-תיכוני כזה מגניב”, של מי שהיה כבר בתוך התחבורה הציבורית - ומצד שני, איזה כיף היה אם הייתה איזו אפליקציה או משהו שהיה מאפשר לפחות להזמין מונית, להגיע איתה מאיפה שאתה רוצה לאן שאתה רוצה, שאומרת לך איך להגיע, מה לעשות, זה היה משלם בשבילך . . . מרגיש כמו חלום.אז הוא הלך והגשים אותו, וככה Via התחילה את חייה - בניו-יורק . . .משם התגלגלנו להרבה מאוד מקומות אחרים - התחלנו כשירות סוג-של-Consumer ומשם הלאה זה התגלגל לשירותים שונים של Pre-booking ו-Paratransit.היום אנחנו מתעסקים גם ברכבים כמו School Bus, כלומר - כל מערך שירות האוטובוסים של עיריית ניו-יורק, כך שבעצם אנחנו יכולים לעשות המון המון דברים.ואת כל זה אנחנו בונים בעצם מעל Stack טכנולוגי אחד ויחיד, שכמו שרן קודם רמז - רובו ככולו נעשה מעל Serverless.(רן) כן, אז מיד ניכנס לסיפור הזה . . . דרך אגב, אני מניח שהרבה מהמאזינים מכירים את Via, וגם היו הרצאות של עובדים שלכם בעבר בכנסים - יש שם ערבוב מעניין של טכנולוגיה, אלגוריתמיקה, Data Science ודברים אחרים - וכמובן גם סיפור מוצרי.יכול להיות שמי ששומע את ה-Pitch שלך אומר “אה! זה Uber!” או אחרים - אבל זה לא . . . הפעם לא ניכנס לזה, כי אנחנו לא עושים פרק על מוצר. יש הבדלים, אבל לא ניכנס אליהם [וכמו כן - 360 Via].ועכשיו - בוא ניכנס לטכנולוגיה: אז אחד הדברים המעניינים ב-Via זה שה-Stack הטכנולוגי כולו, או רובו, רץ מעל פלטפורמת Serverless.אז בוא ספר לנו - איך בנוי ה-Stack שלכם?(ינון) אז בוא נתחיל אולי גם כן היסטורית -אז היסטורית, Via, כמובן, כמו כל חברת היי-טק ישראלית טובה, סטארטאפ ישראלי, התחילה עם Monolith, שלשמחתינו או לצערינו קיים עד היום, אותו Monolith מפורסם שנקרא “Via Server”, שם מאוד מקורי . . . ואותו Via Server רץ במקור על הרבה מכונות EC2 באמאזון - די סטנדרטי, כמו שאתה מצפה, הרבה לפני Kubernetes .מה שגילינו זה שככל שמתפתחים, והזכרתי קודם מקומות ש-Via התפתחה - בעצם ה-Stack עצמו התחיל להיות נורא-נורא יקר . . .מצד אחד, היינו נתקלים בהרבה מאוד בעיות של Scale-up - זאת אומרת שהיה צריך לעשות הרבה Scale-up ולהגדיל את ה-Monolith הזה כדי לתמוך ב-Traffic שכל הזמן גדל, ומצד שני, אם היינו משאירים אותו ב-Scale מאוד גבוה, אז AWS היו מאוד נהנים ו-Via פחות . . . מה גם שב-Via מסתכלים גם על צורת השימוש, בעיקר במקור אבל גם היום - יש תקופות שיש בהן Peak, אפשר לחשוב על זה גם בתל אביב, אנחנו מפעילים גם את שירות Bubble, אז אפשר לחשוב שבשעות הבוקר, בערך מ-07:00 עד 09:30 - מי שמנסה לתפוס Bubble יודע שזה לא פשוט, הרכבים מאוד מאוד עמוסים, ואתם בעצם מפציצים את השרתים שלנו . . . ואותו הדבר קורה בשעות הערב.מצד שני - מי שמנסה לתפוס Bubble בסביבות השעה 12:00, אז החיים שלו מאוד קלים, הוא מוצא אותו מאוד מאוד מהר - וזה פשוט כי יש פחות ביקוש, יש פחות אנשים שרוצים להגיד אל ומ-העבודה.(רן) אבל נשמע שאתם יודעים מה הולכות להיות שעות העומס . . . למה לא פשוט לעשות Scale-up ו-Scale-Down, כשאפילו יש לך חמש דקות לעשות Warm-up לשרתים, אם אתם יודעים מראש . . . ?(ינון) מעולה - אז ככה אמרנו גם אנחנו . . . “מעולה! החיים קלים - אנחנו יודעים לעשות warm-up ו-Scale-Down”הבעיה היא שבשביל זה צריך קצת לחזות ולהבין מה באמת יהיה ה-Traffic - ובכל פעם צריך כמו בפיתוח - לשים באפרים (Buffers) . . .“כמה יהיה מחר בדיוק? אז יהיו מחר 1,000 נוסעים? אז בוא נשים 10 שרתים, או 15 שרתים . . . “ואז למחרת קמים בבוקר - ויש גשם, וגשם זה מכה . . . אז פתאום ה-1,000 נוסעים הופכים ל-2,000 . . . ואם מסתכלים על עיר ב-Scale של ניו-יורק, שהיא עיר מן הסתם הרבה יותר גדולה, אז שם זה הופך מ-10,000 נוסעים ל-100,000 פתאום . . . במכה אחת פתאום כולם מפציצים.אז אם אין גשם - נהדר, זה אומר שמישהו צריך ב-06:00 בבוקר לשים לב ולהגיד “וואו, כדאי שנגדיל עוד יותר את השרתים” - ואתם יכולים לדמיין כמה פעמים מישהו פספס, או פספס לכמה שעות, פספס את ההערכה שלו, ובמקום 100 שרתים הוסיף רק 50 - והפסדנו.(רן) אני מבין את הנקודה הכואבת - זה לקום ב-06:00 בבוקר . . . זאת אומרת, אם זו הייתה חברה שפעילה בצהריים, אז לאף אחד אולי לא היה אכפת, ולא הייתם מגיעים ל-Serverless, אבל לקום ב-06:00 בבוקר זה כבר סיפור . . . (ינון) זו סיבה ממש מעולה לעבור למשהו אחר . . . הבעיה השנייה היא שגם אחרי כשהיינו מעלים את ה-50 שרתים - מישהו גם היה צריך לזכור להוריד את זה אחר כך . . . זה לא תמיד כזה קליל של “נרים עכשיו ואחר כך נזכור”, כי שוכחים, ולמחרת לא . . . והחשבון AWS נראה פתאום לא להיט.אז חפשנו פתרון שיאפשר לנו לעשות גם את ה-Scaling האוטומטי.מצד אחד אתה אומר “אחלה, אז יש פתרונות יותר מודרניים” . . . Kubernetes הגיע בשלב יותר מאוחר, אבל גם לפני כן היו פתרונות של Auto-Scaling Groups ב-AWS, אפשר היה להרים גם איתם.הבעיה היא שכשמסתכלים על זה רגע - אז Monolith שכזה, אמנם כתוב ב-Python, שזה עולה יחסית מהר - ועדיין עד שהוא עולה ועושה את ה-Init שלו, ומוריד . . . אפשר לחשוב קצת על מה ש-Via עושה, אז צריך להוריד את את המפות, צריך להוריד קונפיגורציות, צריך להכין כל מיני דברים . . . וזמן ה-Warm-up והבנייה של ה-Container הוא לא קצר - זה יכול לקחת גם דקות, תלוי כמובן בגודל ה-Traffic ובגודל הדברים שצריך להעלות - וזה כמובן די כואב.לעשות Scale-Up שמסתמך רק על ה-Auto-Scaling הזה מראש זה לא מספיק מהר, ויש תקופה לא מספיק קצרה שמפסידים.מפסידים כסף, מפסידים תנועה - וגם יש שירות ממש לא מוצלח למשתמש שמנסה לנסוע.וזו הסיבה שהתחלנו לחפש דברים אחרים.(רן) אבל זמן טעינה כזה - בטח יבוא יונתן תיכף ויטען - “רגע! אבל אתם לא Monolith! יש Microservices!” . . . אז למה ה-Monolith צרך לטעון את כל המפות של כל העולם ואת כל שאר הדברים? נכון, זה כבד - אבל יש לזה גם פתרונות אחרים, לא רק Serverless . . .(ינון) מעולה - זה השלב שבאמת הסתכלנו - ותודה יונתן על השאלה . . . - הסתכלנו ואמרנו “אוקיי, פתרון אחד זה באמת להגיד יופי, בוא נבנה את זה עם כל מיני Microservices”למעשה, אפשר להסתכל על זה ולהגיד שזה הפתרון שבחרנו - השאלה רק עכשיו היא רק מה ה-Transport שלו, מה ה-Pipeline שבאמצעותו אנחנו בעצם מרימים את אותו הדבר.אופציה אחת הייתה להגיד “אוקיי, נכתוב את הקוד באוסף של שרתים קטנים, ב-Python . . .”, ואגב - בחלק מהדברים זה מה שעשינויש מקומות שבהם . . . Via לא דוגמאטית ואומרת “Serverless is the only way”, זה לא הדרך הנכונה שלנו להסתכל על זה.אנחנו אומרים שבמקומות שבהם אפשר לעשות את זה בצורה קלה דווקא שלא על ידי להרים Service שלם וכבד מעל Kubernetes אלא להסתכל על יתרונות של דברים אחרים, זה היה המקום שבו הסתכלנו על Serverless.אם מסתכלים רק על למה בחרנו ללכת עם Serverless בחלקים ספציפיים, אז בעצם שמנו לעצמנו כמה נקודות מעניינות - אמרנו שאנחנו רוצים כמה שפחות התערבות של DevOps, כי DevOps זה דבר יקר וזה דבר מסובך - לא רק מצד האנשים אלא גם עצם הזמן שמושקע ב-DevOps, בלהרים סביבות ולסדר אותן - מאוד יקר.גם בסביבה מאוד מוצלחת כמו Kubernetes, שבאמת יש לה הרבה יתרונות - עדיין יש הרבה מאוד קונפיגורציה שצריך לעשותצריך להבין מהם הפרמטרים שבעזרתם אנחנו קובעים Scale-up ו-Scale-Down ו-Scale-In - ובעצם לקנפג (Configure) את השרת, לעשות Fine-tuning כל הזמן, כדי להגיע בעצם לתוצאות שהיינו רוצים להגיע אליהן.אז גם בסביבה של Microservices קלאסית כזאת, שבה יש Containers ו-Pods, רוב עבודת הקונפיגורציה הזאת היא עלינו, אחריות שלנו . . . (רן) אני חושב שיש חוק, כלל שימור האנרגיה בטכנולוגיה: עבודה לא נעלמת - היא משנה צורה . . . אם לפני כן היית צריך לחווט כבלים, אז היום אתה צריך לקנפג VPCs, ואם לפני כן היית צריך לקנפג איזושהי מכונה, אז היום אתה צריך לייצר איזשהו Script או לעשות איזושהי קונפיגורציה ב-Terraform או כל כלי אחר . . .ההתמחות משתנה, אבל העבודה לא נעלמת(יונתן) אמרתם שאתם לא דוגמאטיים, זאת אומרת - אתם לא אומרים שזו הדרך היחידה. יש דברים שבהם אתם כן משתמשים עדיין ב . . . ה-Monolith עדיין משחק תפקיד? ה-Microservices עדיין באיזור? או שזה . . .(ינון) קודם כל, ה-Monolith עדיין קיים - לא בכל Deployment ולא בכל מקום, אבל עדיין קיים בלב של חלק מה-Deployment שלנו.ויש לנו עדיין כמה מה-Services האחרים, שהם Microservices סטנדרטיים, עם Containers, חלק כתובים ב-Java, חלקם ב-Python, ועדיין קיימים כ-microServices קלאסיים, Docker Containers בתוך Kubernetes.בעיקר במקומות שבהם יש צריכת זכרון מאוד גבוה - אנחנו צריכים להוריד . . . לצורך העניין להחזיק את המפה - מפה, מן הסתם, זה אובייקט שלוקח הרבה מאוד זכרון, ושם דווקא יוצא לנו יותר נוח להחזיק אותה למשל בתוך Container.כך שיש לנו גם Kubernetes stacks שלמים.עם זאת, במקומות שבהם זו לוגיקה או שהוא “Container Glue” - וזה, אגב, הרבה ממה ש-Via עושה . . .תחשבו, לצורך העניין, על נהגים שמסתובבים בעיר ומדווחים לנו מיקום - הם צריכים כל הזמן לדווח איפה הם נמצאים ולקבל הוראות - זה משהו שלא מצריך הרבה זכרוןמה שהוא באמת מצריך זה את היכולת לעשות Scale-up ו-Scale-In, לפי כמות הנהגים שמסתובבים כרגע בכביש.אז במקום, בעצם, להרים Containers שיודעים לטפל בדבר הזה, גילינו שהרבה יותר קל לנו להרים “micro-micro-micro-Containers”, או “Nano-Containers” כאלה - שזה, בתכל'ס, Lambda . . . אז זה בדיוק מה שזה עושה.(יונתן) אז זה Use-case של, נגיד, לקבל את המיקומים של הנהגים ולכתוב אותם איפשהו, אני מניח? יש Use-cases אחרים, נניח אם אני רוצה להזמין מונית, זה גם . . .(ינון) גם זה על Serverless, לגמרי. גם זה ירוץ Serverless.בעצם תגיע בקשה, לאיזושהי Lambda, שיושבת, לצורך העניין, מאחורי או ALB או איזשהו API Gateway.היא מחוברת ישירות לתוך ה-Lambda - ומשם, למעשה, יכולה לרוץ “שרשרת של Lambda-ות” . . .הגיעה בקשה - מזהה מי הנוסע - משם זה רץ למוקד ההזמנות שלנו, שזה בעצם המוקד “שמדבר” מול הרכבים - תבוצע הזמנה - זה יעבור לאיזושהי Lambda שיודעת לנהל תשלומים, מן הסתם צריך לבדוק שאתה גם רשאי לעלות על הנסיעה . . . זה גם יעבור משם לאיזשהו מקום שהנהג מקבל בו את ההוראותמשם זה יפנה לשירות המיפוי - שזה, להזכיר לכם, Container - נגיד לו “נא לייצר לנהג מסלול חדש”, שיוביל אתכם לאיסוף של אותו נוסע.ובסופו של דבר זה יתורגם כהוראות בחזרה לנהג - והנהג מקבל הוראה וימשיך לשדר לנו את אותם דיווחים, שאנחנו קוראים להם Heartbeat, בשם המקורי . . .וזה יגיע חזרה, בעצם, למערכות שלנו, וימשיך את אותו Flow שהזכרתי קודם.(רן) בוא נדבר רגע על כסף, כלכלה . . . קודם הזכרת שהיו Instances של EC2, והייתם צריך לעשות Scale-up ואז אולי שכחתם לעשות להם Scale-Down וזה עולה כסף וכו' . . . לי יצא לעבוד Serverless, בסטארטאפ הרבה יותר קטן מ-Via, וזה גם היה לפני כמה שנים, לפני חמש שנים או משהו כזה, ואז היה ברור ש-Serverless זה יקר . . . זאת אומרת - יש יתרונות בצד של האופרציה, יש מודל תפעולי, יש מודל תכנותי שהוא בריא, את כל הדברים האלה מאוד מאוד אהבתי - אבל דבר אחד היה ברור: שזה הולך להיות מאוד יקר כשנעשה Scale-up.אם אין כלום באוויר, אז נכון, זה לא עולה - אם לא קוראים לפונקציה שלך, אז עץ שנופל ביער ואף לא שומע אותו אז הוא לא באמת נופל . . . אז זה ברור שיותר קל מאשר לתחזק Container של EC2.אבל ברגע שיש Traffic משמעותי, והפונקציה כל הזמן נקראית, אז היה ברור, לפחות אז, שזה גם הולך להיות הרבה-הרבה-יותר יקר מאשר לתחזק Microservice משלך.איך נראית הכלכלה של זה היום?(ינון) אז בוא נספר לכם את זה ככה . . . נתחיל דווקא מסיפור ואז ניכנס לאט לאט לכיוון הזה.בעצם, בתחילת משבר הקורונה [הסיבוב הראשון . . .], אתם יכולים לדמיין מה הייתה ההשפעה של זה על שירותי ההסעה . . . במכה אחת, בוקר אחד, בתוך שבוע פחות או יותר, עברנו ממאות אלפי נוסעים בניו-יורק לבערך עשרה.אף אחד לא נסע, אף אחד לא זז - וזה היה בכל העולם, לא רק בניו-יורק.מצד שני, אם מסתכלים רגע עכשיו על הכלכלה, או על מה שעלה כסף - במכה אחת כל ה-Lambda-ות שלנו ירדו לאפס, הפסקנו לשלם עליהן לחלוטין,ולעומת זאת כל אותם Containers - שנשארו ב-Monolith וכל מיני כאלה - השאירו שם לא מעט דולרים, שהמשיכו לזרום ישירות לכיסים של מיסטר בזוס . . . [קצת אמפטיה, לאיש יש חללית לבנות].כך שלפחות ברמת ה-Scale-Up / Scale-In, יש לזה כלכלה שהיא סופר-מוצלחת - אנחנו לא צריכים להשאיר בשום שלב “ספיירים” כדי להתמודד עם עומס “למקרה ש…"אפשר לדבר על Warn-ups, אבל לא משאיריםומצד שני, גם בזמן שהיא באוויר והיא כן עושה את הפעילות, אנחנו רואים שזמן העיבוד בפועל, שבו ה-Lambda בעצם עובדת, הוא מאוד נמוך - בעיקר כי כי משתמשים בקריאות א-סינכרוניות.אם עובדים בצורה שהיא יותר א-סינכרונית - כלומר, קריאות שמגיעות עוברות . . . משקיעות את רוב הזמן שלהן במעבר בין Lambda-ות בתוך תורים, ואין בו קריאות סינכרוניות החוצה - פתאום הזמן שבעצם ה-Lambda רצה הוא מאוד מאוד קטן [קצר].ספציפית, אגב - לפני כמה חודשים AWS החליפו את צורת ה-Billing שלהם ממינימום של 100 מילי-שניות למינימום של 1 מילי-שנייה ל-Lambda, וזה שיפר משמעותית את העלות שלהן, בעיקר של Lambda-ות קצרות, שזה הרבה ממה שאנחנו עושים.(רן) הבנתי - זאת אומרת שאם, לדוגמא, ה-Lambda-ות . . . נגיד, אני אתאר איזשהו Flow של -Lambda שקוראת ל--Lambda וכו' - אם כל אחת מהן מחכה לשניה בצורה סינכרונית, אז אתה משלם את החשבון של כולן, אם היא בדרך; אבל אם זה קורה בצורה א-סינכרונית, במעבר דרך SQS או כל מכניזם אחר - אז אתה משלם רק על זמן העיבוד המינימלי. בסדר, אני מבין . . .(יונתן) גם מה שמעניין פה, רן, זה שיש קשר ישיר בין ה-Business - שזה ה-Traffic שאתם מקבלים - לבין העלות, מה שעם Services יותר קשה לעשות את הקשר הזה.הוא חי כל הזמן, גם אם הוא לא יקבל Traffic, גם אם אתה לא “מקבל כסף”.(ינון) בדיוק - בערים של Via, יש ערים שלמות שבהן אין לך שירות בשעות מסויימות של היוםגם השירות ב-Bubble, לצורך העניין, הוא רק בשעות היום, הוא נגמר סביב 22:00-23:00 בלילה [אפילו קצת יותר].תחשוב שכל הלילה יש איזשהו שרת פעיל, וכן צריך לענות תשובות לשאלות: אם איזשהו נוסע פותח אפליקציה של Bubble ב-02:00 לפנות בוקר, אנחנו ניתן לו תשובה שאין כרגע שירות - אבל בשביל זה צריך שיהיה איזשהו שרת באוויר . . .אז ככל שנעביר יותר מהדברים האלה ל-Serverless, אם מישהו יבקש בקשה אז הוא יקבל, אבל אם לא - אז אין צורך בכלל להרים את ה-Service.(רן) אז אתה אומר שבגלל שאתם מאופיינים באלסטיות מאוד גדולה - אולי קורונה זו דוגמא קיצונית, אבל עדיין ביום-יום יש אלסטיות - יש סופי-שבוע, יש שעות שונות במהלך היום, יש חגים . . . בכל אופן, יש אלסטיות בצורה יחסית משמעותית - זה עושה את המודל של Serverless ליותר משתלם אצלכם.אני תוהה - אני לא יודע אם יש בכלל את התשובה, אבל אני תוהה - האם למישהו עם Workload יחסית מאוזן לאורך היממה, האם גם לו זה הולך להשתלם?(ינון) זו תמיד שאלה של מה באמת ה-Workload שלך - וכמה באמת ממה שאתה עושה הוא באמת Broken-down ל-Microservices עד הסוף.למה אני מתכוון? לצורך העניין, אם מסתכלים לרגע על אותו שירות של Via, אז מצד אחד מה שבאמת לוקח המון מהקריאות ומה-Traffic אלו אותם Hearbeat-ים של הנהגים - זה משהו שאנחנו יודעים עליו שהוא מאוד מאוד כבד מבחינת כמות הקריאות שנעשות ומבחינת זמן העיבוד שרץ שם בפנים - גם אם העיבוד עצמו הוא מאוד קצר.אז אם יש לך איזשהו שירות שבו את מחזיק את ה-Heartbeats האלה יחד עם עוד שירות ביחד, בעצם אתה עושה פה Coupling מאוד חזק של של שרת אחד יחד לשתי שכבות יחד.בעולם של Serverless, נורא קל לעשות את ה-Breakdown הזה ממש ל-”Nano-Services” - זה Service שאולי אין לו בכלל זכות חיים משל עצמו, אבל לעשות Scale-up של חתיכה קטנה מתוך ה-Service זה נורא קל.(רן) אוקיי, כן, בסדר - זה היה השיקול הכלכלי. עכשיו, בוא נסתכל על השיקול המתודולוגי.אני אשאל את זה ככה - האם המפתחים שלכם ניהיו יותר טובים, כי הם עובדים Stateless? הם נהיו יותר טובים כי הם נאלצים לרוץ תחת Constraints כאלה של פונקציות Lambda? או במילים אחרות - איך אתה רואה שמתודולוגיה כזאת משפיעה על צורת הפיתוח, איכות הפיתוח, איכות הקוד וכו'?(ינון) אז יש לזה כמה תשובות . . .מצד אחד, כן - המפתחים, בלית ברירה, צריכים לחשוב על עולם שבוא אין זכרון מרכזי, אין שיתוף בין . . . השיתוף היחידי בין Containers הוא בעצם משהו חיצוני, כך שזה גורם לאנשים לחשוב כמה שיותר על איך מחזיקים State ומה עושים איתו.באמת עלינו עם הרבה כיוונים ופתרונות לזה, שגם חלקם, אגב, זה שיקולים כלכליים גם כן . . . לצורך העניין, גילינו שעבודה עם Databases שהם יותר Serverless באופי שלהם, כמו DynamoDB, יוצא לנו הרבה יותר זול - וגם נוח מבחינת Burst-ים של Traffic - מאשר להשתמש ב-Database שהוא “MySQL-כזה”, ושיותר קשה לו לעשות Scale-up.ו-Dynamo, לצורך העניין, הוא גם “אם לא השתמשת - לא שילמת”, אז אם לא קראת אז לא קרה כלום - ולעומת זאת ב-MySQL, גם בגרסאות מוצלחות כמו Aurora, אתה משלם כל עוד ה-Instance למעלה, לא יעזור כלום.בנוסף לכך, גילינו שיש הרבה דרכים גם לשפר את הקריאה מה-Database - אנחנו עובדים כמובן גם עם איזשהו ElastiCache או עם איזשהו S3 כ-Cache מקומי, שעוזר לנו להתמודד בעצם עם Burst-ים של “פתאום אלפי Lambda-ות מנסים לתקוף את ה-Database” [הסרט הבא של Netflix?].ו-Dynamo, לצורך העניין, מתמודד עם זה די יפה, בשביל זה הוא בנוי.כש-Aurora, שהוא Database די מוצלח, קצת פחות נהנה מכזה Burst של Traffic.יש כמה פתרונות ל-AWS, שעובדים ויודעים לפתור את הבעיה הזו - חלק מהם זה אנחנו בנינו בעצמנו לצורך העניין - הרמנו Cache ב-Redis מעל הדבר הזה, בעזרת ElastiCache.אופציה אחרת זה שיש לשים Proxy לפני ה-Database - ובעצם לעשות Connection Pooling לפני ה-Database עצמו.זה באמת מאפשר לנו להריץ, שוב, הרבה Load עם הרבה מאוד Spikiness . . .(רן) בוא רגע נתעכב על הסיטואציה הזאת של Connection Pooling - אני חייב להגיב שגם אני נכוותי מזה . . . ממש אותה סיטואציה שאתה מתאר: פונקציות Lambda, עם Aurora ו-MySQL מאחור - ואלפי פונקציות Lambda שמנסות להתחבר אל ה-Database . . . עכשיו - אם כל האלפים הללו היו בסך הכל Thread-ים בתוך אותו Process, אז יש Connection Pooling ולפי . . . נניח שאתה מחליט שה-Database מרשה שיהיו 300 Connections, אז 300 Thread-ים ידברו עם ה-Database, והאחרים יחכו בתור.אבל פה - Lambda לא יודעת “לחכות בתור” . . . . אז הן מתחילות להיכשל . . .(ינון) ה-Lambda-ות הן באמת יצור קצת אנוכי בקטע הזה - הן לא כל כך “מסתכלות מסביב”ובאמת יש שני פתרונות שאנחנו גילינו והשתמשנו בהם - אחד מהם זה פתרון Built-in של AWS, יש להם Proxy שנועד לפתור בדיוק את הבעיות האלה.בעצם שמים . . . תחשוב על זה כעל סוג של מכונה ששמים לפני ה-EC2, בעצם EC2 לפני ה-Database.ה-Lambda מתחברת למכונה הזאת - והמכונה עצמה מחזיקה Connection Pool - והיא אומרת ל-Lambda “אוקיי, חכי שנייה, אני אתפוס אותך על ה-Connection Pool הבא”.וזה מתנהג, בעצם, מבחינת התפיסה, מאוד דומה למה שהזכרת קודם - בתוך Monolith שכזה . . . זה מאפשר להשתמש בעצם ב-Aurora [מחייב רפרנס ל-The Robots of Dawn . . . .](יונתן) דרך אגב, אתה יודע, רק כדי להשלים את התמונה ואת המוטיבציה - זה לא רק שהן מפגיזות את ה-Database וחלקן נכשלות, אלה למה בכלל מייצרים Connection Pool? כדי לחסוך את זמן יצירת ה-Connection, שב-DCP זה זמן יקר - אבל Lambda לא יכולה לעשות את זה . . . Lambda חייבת בכל פעם לייצר את ה-Connection מחדשואז אתה משלם שוב על Latency - וגם דולרים בסופו של דבר . . .וזו רק דוגמא אחת של Connection Pool - אני חושב שכל Local Cache . . . .כל מה שב-Microservices אתה יכול להשתמש ב-Local Cache, פה אתה בבעיה, אתה צריך פתרונות אחרים.(ינון) נכון . . . אז יש כמה דרכים . . . שוב, כשנתקלנו בבעיה דומה, אגב במקום שבו ה-Database היה Read-Mostly, הפתרון היה בעצם להשתמש ב-ElastiCache כסוג-של-Cache מעל ה-Database.אפשרנו להכריז את ה-Database עצמו כהרבה יותר קטן - ומה שצריך זה Lambda ש”פעם ב” . . . פעם בזמן ה-Refresh-הרלוונטי הייתה פשוט מרפרשת (Refresh) את ה-Cache.די פשוט - לקרוא מה-Database, לדחוף ל-Cache . . . בעצם לעבוד ישירות מול ה-ElastiCache.עלו על כמה פתרונות בדרך, אגב - יש פתרון שנקרא EFS, שבעצם מאפשר להחזיק File System, כשה-Lambda-ות בעצם חולקות, ויש גישה שהיא הרבה יותר קלה, לא צריך להחזיק Connection אלה פשוט זה ניגש ישירות ל-Data.וגם כשהוא באותו Proxy, באמת זה שימושי כדי להחזיק DCP Connection פתוח מול ה-Database ואז רק צריך ליצור Connection קטן מול “הדברצ'יק” הזה.(רן) לפעמים קורה שאתה כן רוצה לשלוט על Server . . . זאת אומרת: אנחנו מדברים על Serverless, ואתה רץ בתוך איזשהו Container. אבל וואלה - לפעמים אתה רוצה לקבוע את כמות הזכרון, לשחק ב-TCP Stack, לעשות כל מיני אופטימיזציות על File Descriptors וכו' . . . מה אתה עושה כשאתה מגיע למצב כזה? מה אתה עושה כשאתה מרגיש שאתה כבר “מגרד את תקרת הזכוכית” בתוך ה-Lambda שלכם?(ינון) קודם כל, אני אשאל אותך - למה? מה המניע?כי בדרך אצלנו, At the end of the day, the business is not that . . . אנחנו לא מתעסקים בלהתעסק עם הקרביים של איזשהו קובץ . . . אם אין ברירה אז אין ברירה, אבל לא מצאנו, עד עכשיו, שום מקום שבו היה צורך בזה.כלומר, העדפנו את הקלות של ה-No-Ops, כשכל מה שצריך לקבוע ב-Lambda זה את כמות הזיכרון שלה - והיא רצה.אני כמובן מגזים, ואפשר לקבוע עוד כמה דברים - האם היא רצה בתוך vPC או מחוץ ל-vPC, יש Security Groups וכו' - אבל בגדול, ברגע שקבעת אותם Once אז גמרנו, ואין מה להתעסק עם זה כמעט.לפי כמות הזכרון בעצם אתה קובע את ה . . . לא את הזכרון אלא את ה-Performance הכללי של ה-Lambda.בגדול, כשאני חושב על זה - כשאתה קובע את הזכרון אתה קובע כמה Lambda-ות רצות על Container אחד של AWSוככל שרצות פחות Lambda-ות, כלומר תופסות יותר זיכרון ורצות פחות Lambda-ות על ה-Container - אז יש לך יותר משאבים בתוך ה-Container: גם CPU, גם Network card - וזו בעצם השליטה שיש לך.בסך הכל - גילינו שכשקצת משחקים עם הזכרון אז זה ממש מעל ומעבר למה שאנחנו צריכים מבחינת השליטה שיש שלנו בתוך ה-Server.לדברים שהם ממש Fine grained - אני מסכים, Lambda לא מתאימה.בשביל זה בדיוק אנחנו הולכים למקומות אחרים כמו Containers ו-Pods.(יונתן) אם מסתכלים קצת אחורה, אז פעם היו “מפלצות כאלה” - היה WebSphere ו-JBOSS ואפילו Tomcat . . . אתה היית כותב את הקוד שלך, עושה לו איזשהו . . . היו קוראים לזה WAR ו-EAR וכל מיני קללות . . . ועושה לזה Deploy בתוך איזשהו Container.וכשהגיעו ה-Microservices זה די הלך לכיוון אחר . . . במקום להיות “אורח” בתוך איזשהו Run-time שמישהו אחר מתחזק, והוא מאוד גדול ומורכב ומוטת השליטה שלך היא קטנה, אתה ניהיה בעל בית של ה . . . אם אתה רץ ב-Python אז אתה ניהיה הבעל-בית של ה-Process של ה-Python של ה-JVM - ובאיזשהו אופן ה-Serverless קצת מחזיר אותך אחורה, לפחות “אחורה” מבחינת האופנה . . . אתה עדיין ניהיה אורח בתוך איזשהו Run-time שמישהו אחר מחזיק ומקנפג (Configure) - איך אתה פה עם “הרטרו” הזה? . . .(ינון) אני מת על הרטרו הזה, כי מי שמחזיק ומקפנג את ה-Server הענק הזה זה לא אני . . . זה התותחים ב-AWS, שיודעים בדיוק מה רוצים - והם די טובים בעולם הזה . . . בגלל זה אני חי עם זה בשלום.אם אני הייתי צריך לתחזק את ה-JBOSS הזה או את ה-WebSphere הזה, אז כנראה שלא היינו מדברים היום . . .מכיוון ש-AWS עושים את זה ואנחנו . . . בסופו של דבר הם באמת יודעים בדיוק מה הם עושים והם טובים בזה, אז אני חי עם זה די בשלום..אני אמנם נתון לחסדיהם, וזה נשמע קצת פטאליסטי, אבל at the end of the day, אם יש מישהו שטוב להיות בידיים שלו זה כנראה החבר'ה ב-AWS שעושים עבודה די טובה.ומה שאני מרוויח מזה באמת זה שאני לא צריך להתעסק יותר עם קונפיגורציות מסובכות, אני בסך הכל I Deploy my code, it works - וזה די הסיפור.בטח כאשר אנחנו מרגישים . . . זאת אומרת, AWS הם מאוד פתוחים מבחינת האימפלמנטציה (Implementation) שלהם ומה שהם מוכנים לספר, ברמה כזאת שהם מאפשרים לך להריץ Any Run-time you want, bring your own Run-time . . . אם אתה רוצה ממש להריץ קוד Fortran מעל Lambda אז No problem, you can do it [ברצינות…] כך שזה אמנם סגור מצד אחד - אבל יש לזה הרבה פתיחות מהצד השני.(יונתן) ואני מניח שאם אתה באמת רוצה להיות בעל הבית של ה-Process, אתה תעשה Microservice שיפתור את הבעיה, אם אתה צריך לקנפג (Configure) את הלא-יודע-כמה Descriptors שאתה צריך . . .(ינון) בדיוק, ואגב - גם שם, זה קצת שונה, אבל במובן מסויים אתה Hosted בתוך Kubernetes . . כלומר - יש לך יותר שליטה על ה-Process, אתה שולט באמת על ה-Run-time, יש לך את ה-Pod . . .מצד שני, יש לך עדיין איזשהו “בעל-בית” שאומר לך “שמע, אתה לא בדיוק עושה את מה שאתה . . . אני עדיין בעל הבית פה”.(יונתן) נכון.(רן) אתם עדיין “Python-Shop”? או ש . . .(ינון) עדיין Python-Shop . . .(רן) זאת אומרת - ברמת העיקרון, Lambda מאפשר לך, אפילו יותר בקלות, לגוון בשפות היעד - אבל זו הזדמנות שעוד לא לקחתם.(ינון) נכון - חוץ מהעובדה שבעצם מה שבאמת משפיע, ואפשר לדבר גם על זה קצת, זה Cold-Start . . . מסתכלים רגע על מתי Lambda עולה, אז כש-Lambda מרימה את עצמה, היא צריכה לעשות הרבה קונפיגורציות ו-Setup.ובעצם העלאת Run-time של Python זה ה-Run-time, אולי חוץ מ-Node, הכי מהיר שיש.משמעותית, לצורך העניין, יותר מהיר מאשר לעלות Lambda של Javaוגם כאלה, אגב, יש לנו כמה, מסיבות הסטוריות - ובאמת רואים שה-Run-time של Java, עד שהוא עולה . . . הוא כבד.מצד שני, אם משתמשים ב-Lambda, אחד מהחסרונות - שהוא גם יתרון, במובן מסויים - הוא שה-Lambda Run-time הוא Single-threaded, או לפחות Single-Core - אין שם באמת תמיכה מלאה ב-Multi-threading, שרצים במקביל.אפשר להסתכל על כמה Processes בתוך Lambda, אבל לא Multi-thread - מה שמייתר, לצורך העניין, את הצורך להשתמש ב-asyncio או בכל מני Thread-ים מסובכים ב-Javaוגם הסתכלנו קצת בעבר על Go-lang - שפה כזו מודרנית ומגניבה [חכה לבאמפרס הבא . . . ]אז לכתוב בה Containers זה די מגניב, אבל לכתוב אותה בתוך Lambda זה די מיותר . . . זאת אומרת - אי אפשר להרוויח שם בכלל מכל ה-asyncio שיושב בתוך Go, כל ה-Async Functions(רן) כן . . . טכנית זה אפשרי, רק שאתה לא מרוויח(ינון) בדיוק - זה עדיין Single-threaded אז זה סינכרוני לחלוטין.(רן) איך נראית חווית המפתח? זאת אומרת - מה קורה אם פתאום ה-Production איטי, או פתאום דברים אובדים, או פתאום . . . לא יודע, בקשה מקבלת Time-out או דברים כאלה? איך מדבגים (Debug) תהליך? איך עושים Tracing? . . איך מדבגים פונקציות Lambda שמפוזרות, אני לא יודע כמה . . . כמה יש לכם?(ינון) יש לנו, בפעם האחרונה שספרתי - כמה אלפים טובים של פונקציות.(יונתן) . . . בטח אתה מתחיל להתגעגע ל-Monolith, שיכולת לשים Break-point ולראות בדיוק מה קורה . . .(ינון) . . . בדיוק, זו אכן שחוויה שהיא . . . At first daunting . . . כשמסתכלים על זה בפעם הראשונה, אני זוכר שאני הסתכלתי על זה ואמרתי “אוקיי, מה אני עושה?” . . . אני פותח את CloudWatch ומנסה לחפור בלוגים . . .לא חווייה מאוד מעניינת, לא כיפית כל כך.ובאמת, אחד הדברים ש-Lambda מחייב זה Clear observability - אז יש כלים פנימיים של AWS - לצורך העניין X-Ray, ש-AWS מאוד דוחפיםכלי חביב כזה, שעוזר בעצם לעשות Distributed Tracing.הבעיה העיקרית עם X-Ray זה שצריך לעבוד בשביל לגרום לזה לעבוד . . . כלומר, חלק מהעבודה היא גם להכניס בעצם יכולת של Tracing בפנים . . .(רן) . . . אינסטרומנטציה (Instrumentation)(ינון) . . אינסטרומנטציה שכזאת, בדיוק . . .ואנחנו העדפנו כלי שעושה את אינסטרומנטציה בשבילנוחפרנו קצת מסביב והתלבשנו בסוף על Epsagonלמי שלא מכיר את Epsagon - כלי מעניין מאוד, שבעצם, עם מעט מאוד עבודה, מאפשר להיכנס ולעשות Tracing של כל ה-Lambda-ות שלנו יחד ולחבר אותן ביחד.הוא משתמש בספרייה שנקראת Jaeger כדי לעשות בעצם Distributed Tracing, זו ספריית open-source די מוכרת, פשוט המימוש שלהם די מוצלח.בעצם, זה מאפשר לנו לראות קריאות שמתחילות ב-Lambda אחת ונגמרות ב-Lambda אחרת, בקצה ה-stack, דרך כל ה-Lambda-ות האחרות.גם מבחינת Tracing שלהן, גם מבחינת ה-Payloads שעברו בתוך ה-Lambda-ות - מבחוץ פנימה, דרך ה-SQS-ים השונים, קריאות ל-Database וכן הלאה.בעצם, זה מאפשר גם לראות את הלוגים - וגם לראות Performance: כמה כל קריאה לקחה, בפנים.(רן) אבל מההיכרות שלי עם Jaeger - הוא מצויין כשמדובר על gRPC או HTTP - כל הדברים הסינכרוניים, אבל דברים א-סינכרוניים, למשל המעבר ב-SQS או מעבר ב-Kafka - שם אתה צריך כבר להמציא בעצמך פתרון . . . אז הם עטפו לכם את זה?(ינון) הם עטפו את כל העסק, הם טיפלו בזה מאוד יפה - אפשר לראות ממש את הקריאות ל-SQS ואת המעבר החוצה, את הקריאה החוצה מתוכו.בעצם הכניסו לא מעט מה-Tracing . . . הרחיבו Jaeger לתוך ה-Tracing שלהם - על זה אולי יהיה מעניין לעשות פרק אחר . . .אבל אנחנו כן משתמשים ב-Epsagon ורואים Observability מלא, End-to-End.המקומות היחידים שבהם זה נשבר הם מקומות שבהם לא הוספנו איזה ארבע שורות לתוך ה-Serverless Framework, שבאמצעותו אנחנו עושים Deploy ל-Lambda-ות, ששם בעצם אנחנו לא עושים את ה-Automatic wrapping שלהם - ושם באמת רואים מתי זה נשבר וכמה זה קשה.ובמקומות כאלה שאנחנו מזהים, זה מאוד פשוט להוסיף Tracing אוטומטי שכזה - זה ממש כמה שורות, להוסיף מודול קטן ב-Node וזה הופ! עושה Tracing אוטומטי ובעצם מאפשר לנו Observability מלא ממש של הכל.(רן) אוקיי, אז זה Tracing ב-Production - אבל איך נראית חוויית הפיתוח? אני עכשיו צריך לכתוב איזשהו Service חדש, או פונקציה חדשה - מה, אני פשוט פורש את זה לענן ורואה מה קורה, או שיש איזשהו משהו מקומי?(ינון) . . . That's pretty much itיש כמובן דברים בסיסיים - אם כותבים ב-Python אז Unit Testing זה דבר די סטנדרטי, שאנחנו מן הסתם חייבים לכתוב, זה אפילו סוג-של-תחליף-Complier, בלאית ברירה.יש קצת Linting וכאלה - אבל למי שלא מכיר את Python, ואני מניח שיש מעט מאוד כאלה, יודע שבלי איזה Unit Test אחד או שניים כדי לראות שהקוד באוויר אתה מאוד בקלות פורש איזו שטות . . . אפשר להריץ Unit Testing בשביל לעשות Local Debugging פשוטבדרך כלל, מה שאנחנו עושים זה פשוט פורשים את זה ישירות לענן מהסביבת Dev, מריצים אוסף של קריאות HTTP ל-Lambda-ות כדי לראות שזה עובד, PostPlan עובד שעות נוספות . . .כן יש פה כמה אופציות להריץ לוקאלית, זאת אומרת - גם ל-AWS יש אופציה להריץ סוג-של Local Lambda Server, “להרים את Lambda מקומית”להגיד שזה מאוד נוח? זה לא . . . זה לא להיט, וגילינו שהרבה יותר קל ונוח לנו לפרוש ישירות ל-Dev Environment ולהריץ הכל משם.(רן) זאת אומרת שיש לכם איזשהו עותק של סביבת ה-Production . . . זאת אומרת, להריץ את הפונקציה שלך זה . . השאלה היא האם היא תלויה בפונקציות אחרות? בתורים אחרים? ב-Databases אחרים? שם הדברים יותר מתחילים להסתבך.אז בעצם, את כל זה אתם עושים ישר בענן? לא על תחנה מקומית?(ינון) נכון.אפשר להסתכל על זה בעצם כעל סוג של Sandbox, שמכיל את ה-Lambda-ות שיש לנו בעולם.בעצם, אנחנו פורשים את הקוד ישירות לשם ובודקים אחד מול השני.מן הסתם, כל Lambda היא באחריות של איזשהו צוות, כל אוסף Lambda-ות או כל Service, בעצם - זה לא רק Lambda, אנחנו מסתכלים על אוסף של X [כמה] Lambda-ות כעל Service מסויים, שיש לו איזושהי מטרה.אנחנו בעצם פורשים את השירותים השונים אל תוך הענן ועושים . . . משתמשים ב-Convention כדי להגיע משירות לשירות ולעשות את כל ה-Wiring בין ה-Lambda-ות השונות.(יונתן) במובן מסוים, גם ב-microServices, החל מ-Scale מסויים, אתה בבעיה די דומה . . .זאת אומרת - כשיש לך כבר כמה מאות אתה כבר לא מרים את כולם על ה-Laptop, וגם פה תלוי ב-Cloud שלך, בעצם.(רן) מסכים ב-100% . . .אני חושב שהבעיה, או האתגר, של שירותים מבוססי-דאטה זה לשחזר איזושהי סביבת Production, כלומר - אם אתה רוצה איזשהו Copy של סביבת ה-Production, עם הדאטה של Production, אבל בלי להזיק ל-Production, וגם לא לשלם את העלות של Production.לפעמים, ה-Databases הם ענקיים, ואתה לא באמת רוצה עותק מלא - אז תיקח את ה-Sub-set של הדאטה, שהוא בדיוק מה שאתה צריך אבל לא יותר מזה - וגם לא תזיק ל-Production - זה אתגר לא פשוט לכל מי שמתעסק עם כמויות גדולות של דאטה, בלי קשר ל-Lambda או לא Lambda.כמה זמן אתה ב-Via?(ינון) שלוש שנים . . .(רן) אוקיי . . . כשהגעת, כבר הייתם בעולם ה-Serverless?(ינון) זו בדיוק הייתה ההתחלה, בשלב שבו הסתכלנו על זה בפעם הראשונה.(רן) אני אגיד לך למה אני שואל - אני מנסה לדמיין מפתח ותיק, מפתח מנוסה אחר, שעכשיו נכנס ל-Via. האם אתה מוצא, נגיד כשאתה מסתכל על מפתחים שגוייסו בזמן האחרון, ואני לא מדבר על צעירים שבחיים לא כתבו קוד אלא על כאלה שכבר . . . אתה יודע, “שועלי קרבות” . . .(יונתן) הפילו את ה-Production כבר כמה פעמים . . .(רן) כן . . . האם אתה רואה שהם, אתה יודע - הם מסתכלים על כל עולם ה-Serverless הזה, ועכשיו צריכים לכתוב איזושהי פונקציה חדשה - האם אתה רואה שהם נלחמים ביצר הטבעי שלהם, או שזה פשוט בא להם בטבעיות, והם “משילים מעליהם” איזשהו משקל כבד שהם נשאו עד עכשיו על הכתפיים ופורחים סביבת ה-Serverless?(ינון) אז הייתי אומר שהם די פורחים . . . זאת אומרת, יש תמיד את המעבר המסוחרר הראשון הזה שאומר, כמו שהזכרת קודם: “רגע, אין לי Connection Pool”, “אין לי פה . . . אני צריך להבין רגע איך זה מגיע, אני פורש את זה כבר לענן? מה קרה לי? זה קצת מוקדם?”.אז באמת יש את הכמה ימים האלה של “רגע-רגע, איך אני עושה פה דברים?”אבל באמת זה לוקח ממש מעט זמן.ב-Via, אתה בדרך כלל מתחיל לכתוב קוד תוך פחות משבועייםכלומר - צריך להרים איזושהי סביבה מקומית, צריך לראות שהכל עובד, שהכל מותקן והכל בסדר - ואז טיפה ללמוד את העולם, גם של Via וגם את עולם של Serverless.אבל תוך באמת פחות משבועיים הוא מקבל משימה ואוקיי - פורש בפעם הראשונה ובפעם השנייה ומשם בעצם זה מגיע ל-Production די מהר.(יונתן) יש גם, אני מניח, יתרון שאולי ה-Scope של הקוד שאתה צריך להכיר כדי לעשות שינוי הוא, כנראה, יותר קטן - זאת אומר, הוא כנראה תלוי בעוד הרבה דברים אחרים, אבל כבר מראש צמצמו לך אותו לסט מסויים של פונקציות או של Services . . . (ינון) כן, אז יותר קל, כנראה, לפרק ל-microServices קטנים, כי העלות של להרים microService היא כמעט כלום.לא צריך פה לפרוש איזה Pod חדש או לייצר משהו חדש - זה “אוקיי, מעתיקים את ה-Serverless.yml”, שזה yml פשוט שרק מגדיר את ה-Service עצמו, יש בו ממש-ממש כלום הגדרות.ומשם פורשים Service חדש מאוד-מאוד בקלות, מה שמאפשר לנו בעצם להריץ הרבה מאוד microServicesרק אצלי בקבוצה יש בין 80 ל-100 microServices ו… And Growing . . . (יונתן) יש איזו אופטימיזציה ש-AWS נותנים, נניח שהם מזהים Lambda-ות שקוראות אחת לשנייה בצורה . . . באופן תדיר, ובעצם להוריד את ה-Network ביניהן ושהן תרוצנה In-process?(ינון) רעיון מדליק . . . אבל לא שאני מכיר . . . מה שאנחנו עושים הרבה באמת זה שאם יש לנו הרבה מקומות שבהם אנחנו קוראים לאותו קוד שוב ושוב ושוב, אנחנו פשוט אורזים אותו כ-Packages.כלומר, במקום לארוז אותו כ-Lambda נפרדת, אנחנו אורזים את זה ב-Package כזה, ואז משתמשים בו, ב-Re-use, במקומות שונים.(רן) שזה, ”בשפת Lambda”, זה ספרייה, נכון? זאת אומרת, יש מגבלה על גודל הפונקציה, אז בשביל זה AWS מציעים Packages, שזה כמו Library . . . (ינון) לא . . . ל-Lambda עצמו, Built-in, יש את מה שנקרא Layers . . .עם Layers, ב-AWS בעצם מאפשרים לך להרים, בהגדרות של AWS, ממש “שכבות” כאלה של Lambda, שמאפשרות לפרוש כחלק מה-Lambda, כאשר ה-Lambda עצמה נפרשת לתוך ה-Container.רעיון די מדליק - אנחנו לא משתמשים בו הרבה . . . אנחנו משתמשים ממש ב-Node Packages בשביל לארוז מחדש את ה-Packages אצלנו, מכמה סיבות.ל-Layers היו כל מיני מגבלות טכניות - היה אפשר רק 5 Layers, ואם אתה צריך את השישית אז אתה כבר נתקע.יש עניין ש-Cold start לא מתחיל מחדש את ה-Layer תמיד . . . כך שהשליטה שלך היא לא מספיק חזקה שם.הרגשנו יותר בנוח לעבוד עם Node Packages, עם גרסאות מסודרות, כשכל Lambda תדע מתי היא מתקדמת לגירסא הבאה . . .(רן) רגע, אמרת Node Packages? אנחנו לא ב-Python? . . . (ינון) סליחה . . . Python . . . אתה צודק, 100% . . .(רן) כמעט תפסנו אותך . . . (ינון) כמעט תפסתם אותי . . . אגב - יש לנו Lambda-ות גם ב-Node, כתבנו כמה Lambda-ות ב-JavaScriptיש צוותים שהעדיפו לעבוד ב-JavaScriptלא הרבה . . . זה עובד, אגב, טוב ממש כמו Python, אם כי אני חובב Python יותר מאשר Node, ולכן אצלי בצוות עובדים בעיקר ב-Python.(רן) כן . . . דרך אגב, יונתן אולי נתייחס לשאלה שלך - שאלת האם כשיש פונקציות שקוראות אחת לשנייה באופן תכוף, האם אפשר לעשות כזאת אופטימיזציה, אבל אז, זאת אומרת . . (א) זה רעיון טובאבל כנראה שבמקרה של ינון זה לא יעזור, כי הם עושים את הכל א-סינכרוני ושמים את הכל ב-Queue, אז בכל מקרה צריך לשלוח ל-Queue . . . (יונתן) Wix בדיוק נתנו הרצאה לא מזמן, על ה-Vision שלהם בעולם ה-Serverless, והם נתנו את הדוגמא הזאת.[לפני חודש - Beyond Serverless and DevOps, Aviran Mordo]זאת אומרת - את ההזדמנות לאופטימיזציה הזאתאז שווה ל . . .(רן) אז מה - גם ה-Queue נמצא בתוך ה-Host?(יונתן) אני לא יודע, אני חושב שזה היה . . .צריך לשאול את אבירן, זה היה יותר “תכנונים עתידיים”, כמו שאני הבנתי . . .(רן) הבנתיטוב, ינון, שמע - מרתק . . . אז אנחנו ממש, ככה, לקראת סיום - תן כמה “מילים סוגרות”, אני בטוח שאתם מגייסים . . .(ינון) כמובן . . . אנחנו בהחלט - כמו, כנראה, כל חברה אחרת בארץ - אבל בטח, אצלנו מגייסים.אנחנו מגייסים, אגב, בכל מיני מקומות בארץ - גם בתל אביב, גם בירושליםואנחנו גם די עובדים, כזה, From anywhere - אז אנחנו מאוד נשמח, אם מעניין אתכם.וכן - העולם של Serverless הוא מרתק בעיני, הוא מאוד שינה לי את החשיבה, מרגע שהגעתי ל-Via, ומאפשר לי באמת לעשות כמה דברים מאוד מאוד מהר ובקלות.ושוב - אנחנו באמת, אם נסכם את זה - אנחנו לא דוגמאטיים בעניין.אנחנו מאוד מאוד מאמינים ב-Serverless כאחת מהטכנולוגיות שעוזרות לנו לקדם את המוצראבל, אתה יודע: מה שעובד - עובד . . . If it works, don't break it . . . (רן) טוב, אחלה - אז תודה רבה, היה מעניין, ונתראה.תודה רן, תודה יונתן.כנס רברסים 2021: נפתחה הקריאה להגשות!הקובץ נמצא כאן, האזנה נעימה ותודה רבה לעופר פורר על התמלול
About Patrick StrzelecPatrick Strzelec is a fullstack developer with a focus on building GraphQL gateways and serverless microservices. He is currently working as a technical lead at NorthOne making banking effortless for small businesses.LinkedIn: Patrick StrzelecNorthOne Careers: www.northone.com/about/careersWatch this episode on YouTube: https://youtu.be/8W6lRc03QNU This episode sponsored by CBT Nuggets and Lumigo. TranscriptJeremy: Hi everyone. I'm Jeremy Daly, and this is Serverless Chats. Today, I'm joined by Patrick Strzelec. Hey, Patrick, thanks for joining me.Patrick: Hey, thanks for having me.Jeremy: You are a lead developer at NorthOne. I'd love it if you could tell the listeners a little bit about yourself, your background, and what NorthOne does.Patrick: Yeah, totally. I'm a lead developer here at NorthOne, I've been focusing on building out our GraphQL gateway here, as well as some of our serverless microservices. What NorthOne does, we are a banking experience for small businesses. Effectively, we are a deposit account, with many integrations that act almost like an operating system for small businesses. Basically, we choose the best partners we can to do things like check deposits, just your regular transactions you would do, as well as any insights, and the use cases will grow. I'd like to call us a very tailored banking experience for small businesses.Jeremy: Very nice. The thing that is fascinating, I think about this, is that you have just completely embraced serverless, right?Patrick: Yeah, totally. We started off early on with this vision of being fully event driven, and we started off with a monolith, like a Python Django big monolith, and we've been experimenting with serverless all the way through, and somewhere along the journey, we decided this is the tool for us, and it just totally made sense on the business side, on the tech side. It's been absolutely great.Jeremy: Let's talk about that because this is one of those things where I think you get a business and a business that's a banking platform. You're handling some serious transactions here. You've got a lot of transactions that are going through, and you've totally embraced this. I'd love to have you take the listeners through why you thought it was a good idea, what were the business cases for it? Then we can talk a little bit about the adoption process, and then I know there's a whole bunch of stuff that you did with event driven stuff, which is absolutely fascinating.Then we could probably follow up with maybe a couple of challenges, and some of the issues you face. Why don't we start there. Let's start, like who in your organization, because I am always fascinated to know if somebody in your organization says, “Hey we absolutely need to do serverless," and just starts beating that drum. What was that business and technical case that made your organization swallow that pill?Patrick: Yeah, totally. I think just at a high level we're a user experience company, we want to make sure we offer small businesses the best banking experience possible. We don't want to spend a lot of time on operations, and trying to, and also reliability is incredibly important. If we can offload that burden and move faster, that's what we need to do. When we're talking about who's beating that drum, I would say our VP, Blake, really early on, seemed to see serverless as this amazing fit. I joined about three years ago today, so I guess this is my anniversary at the company. We were just deciding what to build. At the time there was a lot of architecture diagrams, and Blake hypothesized that serverless was a great fit.We had a lot of versions of the world, some with Apache Kafka, and a bunch of microservices going through there. There's other versions with serverless in the mix, and some of the tooling around that, and this other hypothesis that maybe we want GraphQL gateway in the middle of there. It was one of those things that we wanted to test our hypothesis as we go. That ties into this innovation velocity that serverless allows for. It's very cheap to put a new piece of infrastructure up in serverless. Just the other day we wanted to test Kinesis for an event streaming use case, and that was just a half an hour to set up that config, and you could put it live in production and test it out, which is completely awesome.I think that innovation velocity was the hypothesis. We could just try things out really quickly. They don't cost much at all. You only pay for what you use for the most part. We were able to try that out, and as well as reliability. AWS really does a good job of making sure everything's available all the time. Something that maybe a young startup isn't ready to take on. When I joined the company, Blake proposed, “Okay, let's try out GraphQL as a gateway, as a concept. Build me a prototype." In that prototype, there was a really good opportunity to try serverless. They just ... Apollo server launched the serverless package, that was just super easy to deploy.It was a complete no-brainer. We tried it out, we built the case. We just started with this GraphQL gateway running on serverless. AWS Lambda. It's funny because at first, it's like, we're just trying to sell them development. Nobody's going to be hitting our services. It was still a year out from when we were going into production. Once we went into prod, this Lambda's hot all the time, which is interesting. I think the cost case breaks down there because if you're running this thing, think forever, but it was this GraphQL server in front of our Python Django monolift, with this vision of event driven microservices, which has fit well for banking. If you just think about the banking world, everything is pretty much eventually consistent.Just, that's the way the systems are designed. You send out a transaction, it doesn't settle for a while. We were always going to do event driven, but when you're starting out with a team of three developers, you're not going to build this whole microservices environment and everything. We started with that monolith with the GraphQL gateway in front, which scaled pretty nicely, because we were able to sort of, even today we have the same GraphQL gateway. We just changed the services backing it, which was really sweet. The adoption process was like, let's try it out. We tried it out with GraphQL first, and then as we were heading into launch, we had this monolith that we needed to manage. I mean, manually managing AWS resources, it's easier than back in the day when you're managing your own virtual machines and stuff, but it's still not great.We didn't have a lot of time, and there was a lot of last-minute changes we needed to make. A big refactor to our scheduling transactions functions happened right before launch. That was an amazing serverless use case. And there's our second one, where we're like, “Okay, we need to get this live really quickly." We created this work performance pattern really quickly as a test with serverless, and it worked beautifully. We also had another use case come up, which was just a simple phone scheduling service. We just wrapped an API, and just exposed some endpoints, but it was just a lot easier to do with serverless. Just threw it off to two developers, figure out how you do it, and it was ready to be live. And then ...Jeremy: I'm sorry to interrupt you, but I want to get to this point, because you're talking about standing up infrastructure, using infrastructure as code, or the tools you're using. How many developers were working on this thing?Patrick: How many, I think at the time, maybe four developers on backend functionality before launch, when we were just starting out.Jeremy: But you're building a banking platform here, so this is pretty sophisticated. I can imagine another business case for serverless is just the sense that we don't have to hire an operations team.Patrick: Yeah, exactly. We were well through launching it. I think it would have been a couple of months where we were live, or where we hired our first dev ops engineer. Which is incredible. Our VP took a lot of that too, I'm sure he had his hands a little more dirty than he did like early on. But it was just amazing. We were able to manage all that infrastructure, and scale was never a concern. In the early stages, maybe it shouldn't be just yet, but it was just really, really easy.Jeremy: Now you started with four, and I think, what are you now? Somewhere around 25 developers? Somewhere in that space now?Patrick: About 25 developers now, we're growing really fast. We doubled this year during COVID, which is just crazy to think about, and somehow have been scaling somewhat smoothly at least, in terms of just being able to output as a dev team promote. We'll probably double again this year. This is maybe where I shamelessly plug that we're hiring, and we always are, and you could visit northone.com and just check out the careers page, or just hit me up for a warm intro. It's been crazy, and that's one of the things that serverless has helped with us too. We haven't had this scaling bottleneck, which is an operations team. We don't need to hire X operations people for a certain number of developers.Onboarding has been easier. There was one example of during a major project, we hired a developer. He was new to serverless, but just very experienced developer, and he had a production-ready serverless service ready in a month, which was just an insane ramp-up time. I haven't seen that very often. He didn't have to talk to any of our operation staff, and we'd already used serverless long enough that we had all of our presets and boilerplates ready, and permissions locked down, so it was just super easy. It's super empowering just for him to be able to just play around with the different services. Because we hit that point where we've invested enough that every developer when they opened a branch, that branch deploys its own stage, which has all of the services, AWS infrastructure deployed.You might have a PR open that launches an instance of Kinesis, and five SQS queues, and 10 Lambdas, and a bunch of other things, and then tear down almost immediately, and the cost isn't something we really worry about. The innovation velocity there has been really, really good. Just being able to try things out. If you're thinking about something like Kinesis, where it's like a Kafka, that's my understanding, and if you think about the organizational buy-in you need for something like Kafka, because you need to support it, come up with opinions, and all this other stuff, you'll spend weeks trying it out, but for one of our developers, it's like this seems great.We're streaming events, we want this to be real-time. Let's just try it out. This was for our analytics use case, and it's live in production now. It seems to be doing the thing, and we're testing out that use case, and there isn't that roadblock. We could always switch off to a different design if you want. The experimentation piece there has been awesome. We've changed, during major projects we've changed the way we've thought about our resources a few times, and in the end it works out, and often it is about resiliency. It's just jamming queues into places we didn't think about in the first place, but that's been awesome.Jeremy: I'm curious with that, though, with 25 developers ... Kinesis for the most part works pretty well, but you do have to watch those iterator ages, and make sure that they're not backing up, or that you're losing events. If they get flooded or whatever, and also sticking queues everywhere, sounds like a really good idea, and I'm a big fan of that, but it also, that means there's a lot of queues you have to manage, and watch, and set alarms and all that kind of stuff. Then you also talked about a pretty, what sounds like a pretty great CI/CD process to spin up new branches and things like that. There's a lot of dev ops-y ops work that is still there. How are you handling that now? Do you have dedicated ops people, or do you just have your developers looking after that piece of it?Patrick: I would say we have a very spirited group of developers who are inspired. We do a lot of our code-sharing via internal packages. A few of our developers just figured out some of our patterns that we need, whether it's like CI, or how we structure our events stores, or how we do our Q subscriptions. We manage these internal packages. This won't scale well, by the way. This is just us being inspired and trying to reduce some of this burden. It is interesting, I've listened to this podcast and a few others, and this idea of infrastructure as code being part of every developer's toolbox, it's starting to really resonate with our team.In our migration, or our swift shift to full, I'd say doing serverless properly, we've learned to really think in it. Think in terms of infrastructure in our creating solutions. Not saying we're doing serverless the right way now, but we certainly did it the wrong way in the past, where we would spin up a bunch of API gateways that would talk to each other. A lot of REST calls going around the spider web of communication. Also, I'll call these monster Lambdas, that have a whole procedure list that they need to get through, and a lot of points of failure. When we were thinking about the way we're going to do Lambda now, we try to keep one Lambda doing one thing, and then there's pieces of infrastructure stitching that together. EventBridge between domain boundaries, SQS for commands where we can, instead of using API gateway. I think that transitions pretty well into our big break. I'm talking about this as our migration to serverless. I want to talk more about that.Jeremy: Before we jump into that, I just want to ask this question about, because again, I call those fat, some people call them fat Lambdas, I call them Lambda lifts. I think there's Lambda lifts, then fat Lambdas, then your single-purpose functions. It's interesting, again, moving towards that direction, and I think it's super important that just admitting that you're like, we were definitely doing this wrong. Because I think so many companies find that adopting serverless is very much so an evolution, and it's a learning thing where the teams have to figure out what works for them, and in some cases discovering best practices on your own. I think that you've gone through that process, I think is great, so definitely kudos to you for that.Before we get into that adoption and the migration or the evolution process that you went through to get to where you are now, one other business or technical case for serverless, especially with something as complex as banking, I think I still don't understand why I can't transfer personal money or money from my personal TD Bank account to my wife's local checking account, why that's so hard to do. But, it seems like there's a lot of steps. Steps that have to work. You can't get halfway through five steps in some transaction, and then be like, oops we can't go any further. You get to roll that back and things like that. I would imagine orchestration is a huge piece of this as well.Patrick: Yeah, 100%. The banking lends itself really well to these workflows, I'll call them. If you're thinking about even just the start of any banking process, there's this whole application process where you put in all your personal information, you send off a request to your bank, and then now there's this whole waterfall of things that needs to happen. All kinds of checks and making sure people aren't on any fraud lists, or money laundering lists, or even just getting a second dive from our compliance department. There's a lot of steps there, and even just keeping our own systems in sync, with our off-provider and other places. We definitely lean on using step functions a lot. I think they work really, really well for our use case. Just the visual, being able to see this is where a customer is in their onboarding journey, is very, very powerful.Being able to restart at any point of their, or even just giving our compliance team a view into that process, or even adding a pause portion. I think that's one of the biggest wins there, is that we could process somebody through any one of our pipelines, and we may need a human eye there at least for this point in time. That's one of the interesting things about the banking industry is. There are still manual processes behind the scenes, and there are, I find this term funny, but there are wire rooms in banks where there are people reviewing things and all that. There are a lot of workflows that just lend themselves well to step functions. That pausing capability and being able to return later with a response, so that allows you to build other internal applications for your compliance teams and other teams, or just behind the scenes calls back, and says, "Okay, resume this waterfall."I think that was the visualization, especially in an events world when you're talking about like sagas, I guess, we're talking about distributed transactions here in a way, where there's a lot of things happening, and a common pattern now is the saga pattern. You probably don't want to be doing two-phase commits and all this other stuff, but when we're looking at sagas, it's the orchestration you could do or the choreography. Choreography gets very messy because there's a lot of simplistic behavior. I'm a service and I know what I need to do when these events come through, and I know which compensating events I need to dump, and all this other stuff. But now there's a very limited view.If a developer is trying to gain context in a certain domain, and understand the chain of events, although you are decoupled, there's still this extra coupling now, having to understand what's going on in your system, and being able to share it with external stakeholders. Using step functions, that's the I guess the serverless way of doing orchestration. Just being able to share that view. We had this process where we needed to move a lot of accounts to, or a lot of user data to a different system. We were able to just use an orchestrator there as well, just to keep an eye on everything that's going on.We might be paused in migrating, but let's say we're moving over contacts, a transaction list, and one other thing, you could visualize which one of those are in the red, and which one we need to come in and fix, and also share that progress with external stakeholders. Also, it makes for fun launch parties I'd say. It's kind of funny because when developers do their job, you press a button, and everything launches, and there's not really anything to share or show.Jeremy: There's no balloons or anything like that.Patrick: Yeah. But it was kind of cool to look at these like, the customer is going through this branch of the logic. I know it's all green. Then I think one of the coolest things was just the retry ability as well. When somebody does fail, or when one of these workflows fails, you could see exactly which step, you can see the logs, and all that. I think one of the challenges we ran into there though, was because we are working in the banking space, we're dealing with sensitive data. Something I almost wish AWS solved out of the box, would be being able to obfuscate some of that data. Maybe you can't, I'm not sure, but we had to think of patterns for tokenization for instance.Stripe does this a lot where certain parts of their platform, you just get it, you put in personal information, you get back a token, and you use that reference everywhere. We do tokenization, as well as we limit the amount of details flowing through steps in our orchestrators. We'll use an event store with identifiers flowing through, and we'll be doing reads back to that event store in between steps, to do what we need to do. You lose some of that debug-ability, you can't see exactly what information is flowing through, but we need to keep user data safe.Jeremy: Because it's the use case for it. I think that you mentioned a good point about orchestration versus choreography, and I'm a big fan of choreography when it makes sense. But I think one of the hardest lessons you learn when you start building distributed systems is knowing when to use choreography, and knowing when to use orchestration. Certainly in banking, orchestration is super important. Again, with those saga patterns built-in, that's the kind of thing where you can get to a point in the process and you don't even need to do automated rollbacks. You can get to a failure state, and then from there, that can be a pause, and then you can essentially kick off the unwinding of those things and do some of that.I love that idea that the token pattern and using just rehydrating certain steps where you need to. I think that makes a ton of sense. All right. Let's move on to the adoption and the migration process, because I know this is something that really excites you and it should because it is cool. I always know, as you're building out applications and you start to add more capabilities and more functionality and start really embracing serverless as a methodology, then it can get really exciting. Let's take a step back. You had a champion in your organization that was beating the drum like, "Let's try this. This is going to make a lot of sense." You build an Apollo Lambda or a Lambda running Apollo server on it, and you are using that as a strangler pattern, routing all your stuff through now to your backend. What happens next?Patrick: I would say when we needed to build new features, developers just gravitated towards using serverless, it was just easier. We were using TypeScript instead of Python, which we just tend to like as an organization, so it's just easier to hop into TypeScript land, but I think it was just easier to get something live. Now we had all these Lambdas popping up, and doing their job, but I think the problem that happened was we weren't using them properly. Also, there was a lot of difference between each of our serverless setups. We would learn each time and we'd be like, okay, we'll use this parser function here to simplify some of it, because it is very bare-bones if you're just pulling the Serverless Framework, and it took a little ...Every service looked very different, I would say. Also, we never really took the time to sit back and say, “Okay, how do we think about this? How do we use what serverless gives us to enable us, instead of it just being an easy thing to spin up?" I think that's where it started. It was just easy to start. But we didn't embrace it fully. I remember having a conversation at some point with our VP being like, “Hey, how about we just put Express into one of our Lambdas, and we create this," now I know it's a Lambda lift. I was like, it was just easier. Everybody knows how to use Express, why don't we just do this? Why are we writing our own parsers for all these things? We have 10 versions of a make response helper function that was copy-pasted between repos, and we didn't really have a good pattern for sharing that code yet in private packages.We realized that we liked serverless, but we realized we needed to do it better. We started with having a serverless chapter reading between some of our team members, and we made some moves there. We created a shared boilerplate at some point, so it reduced some of the differences you'd see between some of the repositories, but we needed a step-change difference in our thinking, when I look back, and we got lucky that opportunity came up. At this point, we probably had another six Lambda services, maybe more actually. I want to say around, we'd probably have around 15 services at this point, without a governing body around patterns.At this time, we had this interesting opportunity where we found out we're going to be re-platforming. A big announcement we just made last month was that we moved on to a new bank partner called Bancorp. The bank partner that supports Chime, and they're like, I'll call them an engine boost. We put in a much larger, more efficient engine for our small businesses. If you just look at the capabilities they provide, they're just absolutely amazing. It's what we need to build forward. Their events API is amazing as well as just their base banking capabilities, the unit economics they can offer, the times on there, things were just better. We found out we're doing an engine swap. The people on the business side on our company trusted our technical team to do what we needed to do.Obviously, we need to put together a case, but they trusted us to choose our technology, which was awesome. I think we just had a really good track record of delivering, so we had free reign to decide what do we do. But the timeline was tight, so what we decided to do, and this was COVID times too, was a few of our developers got COVID tested, and we rented a house and we did a bubble situation. How in the NHL or MBA you have a bubble. We had a dev bubble.Jeremy: The all-star team.Patrick: The all-star team, yeah. We decided let's sit down, let's figure out what patterns are going to take us forward. How do we make the step-change at the same time as step-change in our technology stack, at the same time as we're swapping out this bank, this engine essentially for the business. In this house, we watched almost every YouTube video you can imagine on event driven and serverless, and I think leading up. I think just knowing that we were going to be doing this, I think all of us independently started prototyping, and watching videos, and reading a lot of your content, and Alex DeBrie and Yan Cui. We all had a lot of ideas already going in.When we all got to this house, we started off with this exercise, an event storming exercise, just popular in the domain-driven design community, where we just threw down our entire business on a wall with sticky notes, and it would have been better to have every business stakeholder there, but luckily we had two people from our product team there as representatives. That's how invested we were in building this outright, that we have products sitting in the room with us to figure it out.We slapped down our entire business on a wall, this took days, and then drew circles around it and iterated on that for a while. Then started looking at what the technology looks like. What are our domain boundaries, and what prototypes do we need to make? For a few weeks there, we were just prototyping. We built out what I'd called baby's first balance. That was the running joke where, how do we get an account opened with a balance, with the transactions minimally, with some new patterns. We really embraced some of this domain-driven-design thinking, as well as just event driven thinking. When we were rethinking architecture, three concepts became very important for us, not entirely new, but important. Item potency was a big one, dealing with distributed transactions was another one of those, as well as the eventual consistency. The eventual consistency portion is kind of funny because we were already doing it a lot.Our transactions wouldn't always settle very quickly. We didn't know about it, but now our whole system becomes eventually consistent typically if you now divide all of your architecture across domains, and decouple everything. We created some early prototypes, we created our own version of an event store, which is, I would just say an opinionated scheme around DynamoDB, where we keep track of revisions, payload, timestamp, all the things you'd want to be able to do event sourcing. That's another thing we decided on. Event sourcing seemed like the right approach for state, for a lot of our use cases. Banking, if you just think about a banking ledger, it is events or an accounting ledger. You're just adding up rows, add, subtract, add, subtract.We created a lot of prototypes for these things. Our events store pattern became basically just a DynamoDB with opinions around the schema, as well as a package of a shared code package with a simple dispatch function. One dispatch function that really looks at enforcing optimistic concurrency, and one that's a little bit more relaxed. Then we also had some reducer functions built into there. That was one of the packages that we created, as well as another prototype around that was how do we create the actual subscriptions to this event store? We landed on SNS to SQS fan-out, and it seems like fan-out first is the serverless way of doing a lot of things. We learned that along the way, and it makes sense. It was one of those things we read from a lot of these blogs and YouTube videos, and it really made sense in production, when all the data is streaming from one place, and then now you just add subscribers all over the place. Just new queues. Fan-out first, highly recommend. We just landed on there by following best practices.Jeremy: Great. You mentioned a bunch of different things in there, which is awesome, but so you get together in this house, you come up with all the events, you do this event storming session, which is always a great exercise. You get a pretty good visualization of how the business is going to run from an event standpoint. Then you start building out this event driven architecture, and you mentioned some packages that you built, we talked about step functions and the orchestration piece of this. Just give me a quick overview of the actual system itself. You said it's backed by DynamoDB, but then you have a bunch of packages that run in between there, and then there's a whole bunch of queues, and then you're using some custom packages. I think I already said that but you're using ... are you using EventBridge in there? What's some of the architecture behind all that?Patrick: Really, really good question. Once we created these domain boundaries, we needed to figure out how do we communicate between domains and within domains. We landed on really differentiating milestone events and domain events. I guess milestone events in other terms might be called integration events, but this idea that these are key business milestones. An account was open, an application was approved or rejected, things that every domain may need to know about. Then within our domains, or domain boundaries, we had these domain events, which might reduce to a milestone event, and we can maintain those contracts in the future and change those up. We needed to think about how do we message all these things across? How do we communicate? We landed on EventBridge for our milestone events. We have one event bus that we talked to all of our, between domain boundaries basically.EventBridge there, and then each of our services now subscribed to that EventBridge, and maintain their own events store. That's backed by DynamoDB. Each of our services have their own data store. It's usually an event stream or a projection database, but it's almost all Dynamo, which is interesting because our old platform used Postgres, and we did have relational data. It was interesting. I was really scared at first, how are we going to maintain relations and things? It became a non-issue. I don't even know why now that I think about it. Just like every service maintains its nice projection through events, and builds its own view of the world, which brings its own problems. We have DynamoDB in there, and then SNS to SQS fan-out. Then when we're talking about packages ...Jeremy: That's Office Streams?Patrick: Exactly, yeah. We're Dynamo streams to SNS, to SQS. Then we use shared code packages to make those subscriptions very easy. If you're looking at doing that SNS to SQS fan-out, or just creating SQS queues, there is a lot of cloud formation boilerplate that we were creating, and we needed to move really quick on this project. We got pretty opinionated quick, and we created our own subscription function that just generates all this cloud formation with naming conventions, which was nice. I think the opinions were good because early on we weren't opinionated enough, I would say. When you look in your AWS dashboard, the read for these aren't prefixed correctly, and there's all this garbage. You're able to have consistent naming throughout, make it really easy to subscribe to an event.We would publish packages to help with certain things. Our events store package was one of those. We also created a Lambda handlers package, which leverages, there's like a Lambda middlewares compose package out there, which is quite nice, and we basically, all the common functionality we're doing a lot of, like parsing a body from S3, or SQS or API gateway. That's just the middleware that we now publish. Validation in and out. We highly recommend the library Zod, we really embrace the TypeScript first object validation. Really, really cool package. We created all these middlewares now. Then subscription packages. We have a lot of shared code in this internal NPM repository that we install across.I think one challenge we had there was, eventually you extracted away too much from the cloud formation, and it's hard for new developers to ... It's easy for them to create events subscriptions, it's hard for them to evolve our serverless thinking because they're so far removed from it. I still think it was the right call in the end. I think this is the next step of the journey, is figuring out how do we share code effectively while not hiding away too much of serverless, especially because it's changing so fast.Jeremy: It's also interesting though that you take that approach to hide some of that complexity, and bake in some of that boilerplate that, someone's mostly didn't have to write themselves anyways. Like you said, they're copying and pasting between services, is not the best way to do it. I tried the whole shared packages thing one time, and it kind of worked. It's just like when you make a small change to that package and you have 14 services, that then you have to update to get the newest version. Sometimes that's a little frustrating. Lambda layers haven't been a huge help with some of that stuff either. But anyways, it's interesting, because again you've mentioned this a number of times about using queues.You did mention resiliency in there, but I want to touch on that point a little bit because that's one of those things too, where I would assume in a banking platform, you do not want to lose events. You don't want to lose things. and so if something breaks, or something gets throttled or whatever, having to go and retry those events, having the alerts in place to know that a queue is backed up or whatever. Then just, I'm thinking ordering issues and things like that. What kinds of issues did you face, and tell me a little bit more about what you've done for reliability?Patrick: Totally. Queues are definitely ... like SQS is a workhorse for our company right now. We use a lot of it. Dropping messages is one of the scariest things, so you're dead-on there. When we were moving to event driven, that was what scared me the most. What if we drop an event? A good example of that is if you're using EventBridge and you're subscribing Lambdas to it, I was under the impression early on that EventBridge retries forever. But I'm pretty sure it'll retry until it invokes twice. I think that's what we landed on.Jeremy: Interesting.Patrick: I think so, and don't quote me on this. That was an example of where drop message could be a problem. We put a queue in front of there, an SQS queue as the subscription there. That way, if there's any failure to deliver there, it's just going to retry all the time for a number of days. At that point we got to think about DLQs, and that's something we're still thinking about. But yeah, I think the reason we've been using queues everywhere is that now queues are in charge of all your retry abilities. Now that we've decomposed these Lambdas into one Lambda lift, into five Lambdas with queues in between, if anything fails in there, it just pops back into the queue, and it'll retry indefinitely. You can drop messages after a few days, and that's something we learned luckily in the prototyping stage, where there are a few places where we use dead letter queues. But one of the issues there as well was ordering. Ordering didn't play too well with ...Jeremy: Not with DLQs. No, it does not, no.Patrick: I think that's one lesson I'd want to share, is that only use ordering when you absolutely need it. We found ways to design some of our architecture where we didn't need ordering. There's places we were using FIFO SQS, which was something that just launched when we were building this thing. When we were thinking about messaging, we're like, "Oh, well we can't use SQS because they don't respect ordering, or it doesn't respect ordering." Then bam, the next day we see this blog article. We got really hyped on that and used FIFO everywhere, and then realized it's unnecessary in most use cases. So when we were going live, we actually changed those FIFO queues into just regular SQS queues in as many places as we can. Then so, in that use case, you could really easily attach a dead letter queue and you don't have to worry about anything, but with FIFO things get really, really gnarly.Ordering is an interesting one. Another place we got burned I think on dead-letter queues, or a tough thing to do with dead letter queues is when you're using our state machines, we needed to limit the concurrency of our state machines is another wishlist item in AWS. I wish there was just at the top of the file, a limit concurrent executions of your state machine. Maybe it exists. Maybe we just didn't learn to use it properly, but we needed to. There's a few patterns out there. I've seen the [INAUDIBLE] pattern where you can use the actual state machine flow to look back at how many concurrent executions you have, and pause. We landed on setting reserved concurrency in a number of Lambdas, and throwing errors. If we've hit the max concurrency and it'll pause that Lambda, but the problem with DLQs there was, these are all errors. They're coming back as errors.We're like, we're fine with them. This is a throttle error. That's fine. But it's hard to distinguish that from a poison message in your queue, so when do you dump those into DLQ? If it's just a throttling thing, I don't think it matters to us. That was another challenge we had. We're still figuring out dead letter queues and alerting. I think for now we just relied on CloudWatch alarms a lot for our alerting, and there's a lot you could do. Even just in the state machines, you can get pretty granular there. I know once certain things fail, and announced to your Slack channel. We use that Slack integration, it's pretty easy. You just go on a Slack channel, there's an email in there, you plop it into the console in AWS, and you have your very early alerting mechanism there.Jeremy: The thing with Elasticsearch ... not Elasticsearch, I'm sorry. I'm totally off-topic here. The thing with EventBridge and Lambda, these are one of those things that, again, they're nuances, but event bridge, as long as it can deliver to the Lambda service, then the Lambda service kicks off and queues it automatically. Then that will retry at a certain number of times. I think you can control that now. But then eventually if that retries multiple times and eventually fails, then that kicks it over to the DLQ or whatever. There's all different ways that it works like that, but that's why I always liked the idea of putting a queue in between there as well, because I felt you just had a little bit more control over exactly what happens.As long as it gets to the queue, then you know you haven't lost the message, or you hope you haven't lost a message. That's super interesting. Let's move on a little bit about the adoption issues. You mentioned a few of these things, obviously issues with concurrency and ordering, and some of that other stuff. What about some of the other challenges you had? You mentioned this idea of writing all these packages, and it pulls devs away from the CloudFormation a little bit. I do like that in that it, I think, accelerates a lot of things, but what are some of the other maybe challenges that you've been having just getting this thing up and running?Patrick: I would say IAM is an interesting one. Because we are in the banking space, we want to be very careful about what access do you give to what machines or developers, I think machines are important too. There've been cases where ... so we do have a separate developer set up with their own permissions, in development's really easy to spin up all your services within reason. But now when we're going into production, there's times where our CI doesn't have the permissions to delete a queue or create a queue, or certain things, and there's a lot of tweaking you have to do there, and you got to do a lot of thinking about your IAM policies as an organization, especially because now every developer's touching infrastructure.That becomes this shared operational overhead that serverless did introduce. We're still figuring that out. Right now we're functioning on least privilege, so it's better to just not be able to deploy than deploy something you shouldn't or read the logs that you shouldn't, and that's where we're starting. But that's something that, it will be a challenge for a little while I think. There's all kinds of interesting things out there. I think temporary IAM permissions is a really cool one. There are times we're in production and we need to view certain logs, or be able to access a certain queue, and there's tooling out there where you can, or at least so I've heard, you can give temporary permissions. You have this queue permission for 30 minutes, and it expires and it's audited, and I think there's some CloudTrail tie-in you could do there. I'm speaking about my wishlist for our next evolution here. I hope my team is listening ...Jeremy: Your team's listening to you.Patrick: ... will be inspired as well.Jeremy: What about ... because this is something too that I always found to be a challenge, especially when you start having multiple services, and you've talked about these domain events, but then milestone events. You've got different services that need to communicate across services, or across domains, and realize certain things like that. Service discovery in and of itself, and which queue are we mapping to, or which service am I talking to, and which version of the service am I talking to? Things like that. How have you been dealing with that stuff?Patrick: Not well, I would say. Very, very ad hoc. I think like right now, at least we have tight communication between the teams, so we roughly know which service we need to talk to, and we output our URLs in the cloud formation output, so at least you could reference the URLs across services, a little easier. Really, a GraphQL is one of the only service that really talks to a lot of our API gateways. At least there's less of that, knowing which endpoint to hit. Most of our services will read into EventBridge, and then within services, a lot of that's abstracted away, like the queue subscription's a little easier. Service discovery is a bit of a nightmare.Once our services grow, it'll be, I don't know. It'll be a huge challenge to understand. Even which services are using older versions of Node, for instance. I saw that AWS is now deprecating version 10 and we'll have to take a look internally, are we using version 10 anywhere, and how do we make sure that's fine, or even things like just knowing which services now have vulnerabilities in their NPM packages because we're using Node. That's another thing. I don't even know if that falls in service discovery, but it's an overhead of ...Jeremy: It's a service management too. It's a lot there. That actually made me, it brings me to this idea of observability too. You mentioned doing some CloudWatch alerts and some of that stuff, but what about using some observability tool or tracing like x-ray, and things like that? Have you been implementing any of that, and if you have, have you had any success and or problems with it?Patrick: I wish we had a better view of some of the observability tools. I think we were just building so quickly that we never really invested the time into trying them out. We did use X-Ray, so we rolled our own tooling internally to at least do what we know. X-Ray was one of those, but the problem with X-Ray is, we do subscribe all of our services, but X-Ray isn't implemented everywhere internally in AWS, so we lose our trail somewhere in that Dynamo stream to SNS, or SQS. It's not a full trace. Also, just digesting that huge graph of information is just very difficult. I don't use it often, I think it's a really cool graphic to show, “Hey, look, how many services are running, and it's going so fast."It's a really cool thing to look at, but it hasn't been very useful. I think our most useful tool for debugging and observability has been just our logging. We created a JSON logger package, so we get up JSON logs and we can actually filter off of different properties, and we ship those to Elasticsearch. Now you can have a view of all of the functions within a given domain at any point in time. You could really see the story. Because I think early on when we were opening up CloudWatch and you'd have like 10 tabs, and you're trying to understand this flow of information, it was very difficult.We also implemented our own trace ID pattern, and I think we just followed a Lumigo article where we introduced some properties, and in each of our Lambdas at a higher level, and one of our middlewares, and we were able to trace through. It's not ideal. Observability is something that we'll probably have to work on next. It's been tolerable for now, but I can't see the scaling that long.Jeremy: That's the other thing too, is even the shared package issue. It's like when you have an observability tool, they'll just install a layer or something, where you don't necessarily have to worry about updating your own tool. I always find if you are embracing serverless and you want to get rid of all that undifferentiated heavy lifting, observability tools, there's a lot of really good ones out there that are doing some great stuff, and they're specializing in it. It might be worth letting someone else handle that for you than trying to do it yourself internally.Patrick: Yeah, 100%. Do you have any that you've used that are particularly good? I know you work with serverless so-Jeremy: I played around with all of them, because I love this stuff, so it's always fun, but I mean, obviously Lumigo and Epsagon, and Thundra, and New Relic. They're all great. They all do things slightly differently, but they all follow a similar implementation pattern so that it's very easy to install them. We can talk more about some recommendations. I think it's just one of those things where in a modern application not having that insight is really hard. It can be really hard to debug stuff. If you look at some of the tools that AWS offers, I think they're there, it's just, they are maybe a little harder to implement, and not quite as refined and targeted as some of the observability tools. But still, you got to get there. Again, that's why I keep saying it's an evolution, it's a process. Maybe one time you get burned, and you're like, we really needed to have observability, then that's when it becomes more of a priority when you're moving fast like you are.Patrick: Yeah, 100%. I think there's got to be a priority earlier than later. I think I'll do some reading now that you've dropped some of these options. I have seen them floating around, but it's one of those things that when it's too late, it's too late.Jeremy: It's never too late to add observability though, so it should. Actually, a lot of them now, again, it makes it really, really easy. So I'm not trying to pitch any particular company, but take a look at some of them, because they are really great. Just one other challenge that I also find a lot of people run into, especially with serverless because there's all these artificial account limits in place. Even the number of queues you can create, and the number of concurrent Lambda functions in a particular region, and stuff like that. Have you run into any of those account limit issues?Patrick: Yeah. I could give you the easiest way to run into an account on that issue, and that is replay your entire EventBridge archive to every subscriber, and you will find a bottleneck somewhere. That's something ...Jeremy: Somewhere it'll fall over? Nice.Patrick: 100%. It's a good way to do some quick check and development to see where you might need to buffer something, but we have run into that. I think the solution there, and a lot of places was just really playing with concurrency where we needed to, and being thoughtful about where is their main concurrency in places that we absolutely needed to stay functioning. I think the challenge there is that eats into your total account concurrency, which was an interesting learning there. Definitely playing around there, and just being thoughtful about where you are replaying. A couple of things. We use replays a lot. Because we are using these milestone events between service boundaries, now when you launch a new service, you want to replay that whole history all the way through.We've done a lot of replaying, and that was one of the really cool things about EventBridge. It just was so easy. You just set up an archive, and it'll record everything coming through, and then you just press a button in the console, and it'll replay all of them. That was really awesome. But just being very mindful of where you're replaying to. If you replay to all of your subscriptions, you'll hit Lambda concurrency limits real quick. Even just like another case, early on we needed to replace ... we have our own domain events store. We want to replace some of those events, and those are coming off the Dynamo stream, so we were using dynamo to kick those to a stream, to SNS, and fan-out to all of our SQS queues. But there would only be one or two queues you actually needed to subtract to those events, so we created an internal utility just to dump those events directly into the SQS queue we needed. I think it's just about not being wasteful with your resources, because they are cheap. Sure.Jeremy: But if you use them, they start to cost money.Patrick: Yeah. They start to cost some money as well as they could lock down, they can lock you out of other functionality. If you hit your Lambda limits, now our API gateway is tapped.Jeremy: That's a good point.Patrick: You could take down your whole system if you just aren't mindful about those limits, and now you could call up AWS in a panic and be like, “Hey, can you update our limits?" Luckily we haven't had to do that yet, but it's definitely something in your back pocket if you need it, if you can make the case to AWS, that maybe you do need bigger limits than the default. I think just not being wasteful, being mindful of where you're replaying. I think another interesting thing there is dealing with partners too. It's really easy to scale in the Lambda world, but not every partner could handle that volume really quickly. If you're not buffering any event coming through EventBridge to your new service that hits a partner every time, you're going to hit their API rate limit really quickly, because they're just going to just go right through it.You might be doing thousands of API calls when you're instantiating a new service. That's one of those interesting things that we have to deal with, and particularly in our orchestrators, because they are talking to different partners, that's why we need to really make sure we could limit the concurrent executions of the state machines themselves. In a way, some of our architecture is too fast to scale.Jeremy: It's too good.Patrick: You still have to consider downstream. That, and even just, if you are using relational databases or anything else in your system, now you have to worry about connection limits and ...Jeremy: I have a whole talk I gave on that.Patrick: ... spikes in traffic.Jeremy: Yes, absolutely.Patrick: Really cool.Jeremy: I know all about it. Any final advice for companies like you that are trying to bite off a piece of the serverless apple, I guess, That's really bad. Anyways, any advice for people looking to get into this?Patrick: Yeah, totally. I would say start small. I think we were wise to just try it out. It might not land with your development team. If you don't really buy in, it's one of those things that could just end up unnecessarily messy, so start small, see if you like it in-shop, and then reevaluate, once you hit a certain point. That, and I would say shared boilerplate packages sooner than later. I know shared code is a problem, but it is nice to have an un-opinionated starter pack, that you're at least not doing anything really crazy. Even just things like having opinions around logging. In our industry, it's really important that you're not logging sensitive details.For us doing things like wrapping our HTTP clients to make sure we're not logging sensitive details, or having short Lambda packages that make sure out-of-the-box you're opinionated about not doing something terribly awful. I would say those two things. Start small and a boiler package, and maybe the third thing is just pay attention to the code smell of a growing Lambda. If you are doing three API calls in one Lambda, chances are you could probably break that up, and think about it in a more resilient way. If any one of those pieces fail, now you could have retry ability in each one of those. Those are the three things I would say. I could probably talk forever about the rest of our journey.Jeremy: I think that was great advice, and I love hearing about how companies are going through this process, what that process looks like, and I hope, I hope, I hope that companies listen to this and can skip a lot of these mistakes. I don't want to call them all mistakes, and I think it's just evolution. The stuff that you've done, we've all made them, we've all gone through that process, and the more we can solidify these practices and stuff like that, I think that more companies will benefit from hearing stories like these. Thank you very much for sharing that. Again, thank you so much for spending the time to do this and sharing all of this knowledge, and this journey that you've been on, and are continuing to be on. It would great to continue to get updates from you. If people want to contact you, I know you're not on Twitter, but what's the best way to reach out to you?Patrick: I almost wish I had a Twitter. It's the developer thing to have, so maybe in the future. Just on LinkedIn would be great. LinkedIn would be great, as well as if anybody's interested in working with our team, and just figuring out how to take serverless to the next level, just hit me up on LinkedIn or look at our careers page at northone.com, and I could give you a warm intro.Jeremy: That's great. Just your last name is spelled S-T-R-Z-E-L-E-C. How do you say that again? Say it in Polish, because I know I said it wrong in the beginning.Patrick: I guess for most people it would just be Strzelec, but if there are any Slavs in the audience, it's "Strzelec." Very intense four consonants last name.Jeremy: That is a lot of consonants. Anyways again, Patrick, thanks again. This was great.Patrick: Yeah, thank you so much, Jeremy. This has been awesome.
About Julian WoodJulian Wood is a Senior Developer Advocate for the AWS Serverless Team. He loves helping developers and builders learn about, and love, how serverless technologies can transform the way they build and run applications at any scale. Julian was an infrastructure architect and manager in global enterprises and start-ups for more than 25 years before going all-in on serverless at AWS.Twitter: @julian_woodAll things Serverless @ AWS: ServerlessLandServerless Patterns CollectionServerless Office Hours – every Tuesday 10am PTLambda ExtensionsLambda Container ImagesWatch this episode on YouTube: https://youtu.be/jtNLt3Y51-gThis episode sponsored by CBT Nuggets and Lumigo.TranscriptJeremy: Hi everyone, I'm Jeremy Daly and this is Serverless Chats. Today I'm joined by Julian Wood. Hey Julian, thanks for joining me.Julian: Hey Jeremy, thank you so much for inviting me.Jeremy: Well, I am super excited to have you here. I have been following your work for a very long time and of course, big fan of AWS. So you are a Serverless Developer Advocate at AWS, and I'd love it if you could just tell the listeners a little bit about your background, so they get to know you a bit. And then also, sort of what your role is at AWS.Julian: Yeah, certainly. Well, I'm Julian Wood. I am based in London, but yeah, please don't let my accent fool you. I'm actually originally from South Africa, so the language purists aren't scratching their heads anymore. But yeah, I work within the Serverless Team at AWS, and hopefully do a number of things. First of all, explain what we're up to and how our sort of serverless things work and sort of, I like to sometimes say a bit cheekily, basically help the world fall in love with serverless as I have. And then also from the other side is to be a proxy and sort of be the voice of builders, and developers and whoever's building service applications, and be their voices internally. So you can also keep us on our toes to help build the things that will brighten your days.And just before, I've worked for too many years probably, as an infrastructure racker, stacker, architect, and manager. I've worked in global enterprises babysitting their Windows and Linux servers, and running virtualization, and doing all the operations kind of stuff to support that. But, I was always thinking there's a better way to do this and we weren't doing the best for the developers and internal customers. And so when this, you know in inverted commas, "serverless way" of things started to appear, I just knew that this was going to be the future. And I could happily leave the server side to much better and cleverer people than me. So by some weird, auspicious alignment of the stars, a while later, I managed to get my current dream job talking about serverless and talking to you.Jeremy: Yeah. Well, I tell you, I think a lot of serverless people or people who love serverless are recovering ops and infrastructure people that were doing racking and stacking. Because I too am also recovering from that and I still have nightmares.I thought that it was interesting too, how you mentioned though, developer advocacy. It's funny, you work for a specific company, AWS obviously, but even developer advocacy in general, who is that for? Who are you advocating for? Are you advocating for the developers to use the service from the company? Are you advocating for the developers so that the company can provide the services that they actually need? Interesting balance there.Julian: Yeah, it's true. I mean, the honest answer is we don't have great terms for this kind of role, but yeah, I think primarily we are advocating for the people who are developing the applications and on the outside. And to advocate for them means we've got to build the right stuff for them and get their voices internally. And there are many ways of doing that. Some people raise support requests and other kind of things, but I mean, sometimes some of our great ideas come from trolling Twitter, or yes, I know even Hacker News or that kind of thing. But also, we may get responses from 10 different people about something and that will formulate something in our brain and we'll chat with other kind of people. And that sort of starts a thing. It's not just necessarily each time, some good idea in Twitter comes in, it gets mashed into some big surface database that we all pick off.But part of our job is to be out there and try and think and be developers in whatever backgrounds we come from. And I mean, I'm not a pure software developer where I've come from, and I come, I suppose, from infrastructure, but maybe you'd call that a bit of systems engineering. So yeah, I try and bring that background to try and give input on whatever we do, hopefully, the right stuff.Jeremy: Right. Yeah. And then I think part of the job too, is just getting the information out there and getting the examples out there. And trying to create those best practices or at least surface those best practices, and encourage the community to do a lot of that work and to follow that. And you've done a lot of work with that, obviously, writing for the AWS blog. I know you have a series on the Serverless Lens and the Well-Architected Framework, and we can talk about that in a little while. But I really want to talk to you about, I guess, just the expansion of serverless over the last couple of years.I mean, it was very narrowly focused, probably, when it first came out. Lambda was ... FaaS as a whole new concept for a lot of people. And then as this progressed and we've gotten more APIs, and more services and things that it can integrate with, it just becomes complex and complicated. And that's a good thing, but also maybe a bad thing. But one of the things that AWS has done, and I think this is clearly in reaction to the developers needing it, is the ability to extend what you can do with a Lambda function, right? I mean, the idea of just putting your code in there and then, boom, that's it, that's all you have to do. That's great. But what if you do need access to lifecycle hooks? Or what if you do want to manipulate the underlying runtime or something like that? And AWS, I think has done a great job with that.So maybe we can start there. So just about the extensibility of Lambda in general. And one of the new things that was launched recently was, and recently, I don't know what was it? Seven months ago at this point? I'm not even sure. But was launched fairly recently, let's say that, is Lambda Extensions, and a couple of different flavors of that as well. Could you kind of just give the users an over, the users, wow, the listeners an overview of what Lambda Extensions are?Julian: I could hear the ops background coming in, talking about our users. Yeah. But I mean, from the get-go, serverless was always a terrible term because, why on earth would you name something for what it isn't? I mean, you know? I remember talking to DBAs, talking about noSQL, and they go, "Well, if it's not SQL, then what is it?" So we're terrible at that, serverless as well. And yeah, Lambda was very constrained when it came out. Lambda was never built being a serverless thing, that's what was the outcome. Sometimes we focus too much on the tools rather than the outcome. And the story is S3, just turning 15. And the genesis of Lambda was being an event trigger for S3, and people thought you'd upload something to S3, fire off a Lambda function, how cool is that? And then obviously the clever clubs at the time were like, "Well, hang on, let's not just do this for S3, let's do this for a whole bunch of kind of things."So Lambda was born out of that, as that got that great history, which is created an arc sort of into the present and into the future, which I know we're also going to get on about, the power of event driven applications. But the power of Lambda has always been its simplicity, and removing that operational burden, and that heavy lifting. But, sometimes that line is a bit of a gray area and there're people who can be purists about serverless and can be purists about FaaS and say, "Everything needs to be ephemeral. Lambda functions can't extend to anything else. There shouldn't be any state, shouldn't be any storage, shouldn't be any ..." All this kind of thing.And I think both of us can agree, but I don't want to speak for you, but I think both of us would agree that in some sense, yeah, that's fine. But we live in the real world and there's other stuff that needs to connect to and we're not here about building purist kind of stuff. So Lambda Extensions is a new way basically to integrate Lambda with your favorite tools. And that's the sort of headline thing we like to talk about. And the big idea is to open up Lambda to more effectively work mainly with partners, but also your own tools if you want to write them. And to sort of have deeper hooks into the Lambda lifecycle.And other partners are awesome and they do a whole bunch of stuff for serverless, plus customers also have connections to on-prem staff, or EC2 staff, or containers, or all kind of things. How can we make the tools more seamless in a way? How can we have a common set of tools maybe that you even use on-prem or in the cloud or containers or whatever? Why does Lambda have to be unique or different or that kind of thing? And Extensions is sort of one of the starts of that, is to be able to use these kind of tools and get more out of Lambda. So I mean, just the kind of tools that we've already got on board, there's things like Splunk and AppDynamics. And Lumigo, Epsagon, HashiCorp, Honeycomb, CoreLogic, Dynatrace, I can't think. Thundra and Sumo Logic, Check Point. Yeah, I'm sorry. Sorry for any partners who I've forgotten a few.Jeremy: No, right, no. That's very good. Shout them out, shout them out. No, I mean just, and not to interrupt you here, but ...Julian: No, please.Jeremy: ... I think that's great. I mean, I think that's one of the things that I like about the way that AWS deals with partners, is that ... I mean, I think AWS knows they can't solve all these problems on their own. I mean, maybe they could, right? But they would be their own way of solving the problems and there's other people who are solving these problems differently and giving you the ability to extend your Lambda functions into those partners is, there's a huge win for not only the partners because it creates that ecosystem for them, but also for AWS because it makes the product itself more valuable.Julian: Well, never mind the big win for customers because ultimately they're the one who then gets a common deployment tool, or a common observability tool, or a HashiCorp Vault that you can manage secrets and a Lambda function from HashiCorp Vault. I mean, that's super cool. I mean, also AWS services are picking this up because that's easy for them to do stuff. So if anybody's used Lambda Insights or even seen Lambda Insights in the console, it's somewhere in the monitoring thing, and you just click something over and you get this tool which can pull stuff that you can't normally get from a Lambda function. So things like CPU time and network throughput, which you couldn't normally get. But actually, under the hoods, Lambda Insights is using Lambda extensions. And you can see that if you look. It automatically adds the Lambda layer and job done.So anyway, this is how a lot of the tools work, that a layer is just added to a Lambda function and off you go, the tool can do its work. So also there's a very much a simplicity angle on this, that in a lot of cases you don't have to do anything. You configure some of the extensions via environment variables, if that's cooled you may just have an API key or a log retention value or something like that, I don't know, any kind of example of that. But you just configure that as a normal Lambda environment variable at this partner extension, which is just a Lambda layer, and off you go. Super simple.Jeremy: Right. So explain Extensions exactly, because I think that's one of those things because now we have Lambda layers and we have Lambda Extensions. And there's also like the runtime API and then something else. I mean, even I'm not 100% sure what all of the naming conventions. I'm pretty sure I know what they do ...Julian: Yeah, fair enough.Jeremy: ... but maybe we could say the names and say exactly what they do as well.Julian: Yeah, cool. You get an API, I get an API, everybody gets an API. So Lambda layers, let's just start, because that's, although it's not related to Extensions, it's how Extensions are delivered to the power core functions. And Lambda layers is just another way to add code to a Lambda function or not even code, it can be a dependency. It's just a way that you could, and it's cool because they are shareable. So you have some dependencies, or you have a library, or an SDK, or some training data for something, a Lambda layer just allows you to add some bits and bobs to your Lambda function. That's a horrible explanation. There's another word I was thinking of, because I don't want to use the word code, because it's not necessarily code, but it's dependency, whatever. It's just another way of adding something. I'll wake up in a cold sweat tonight thinking of the word I was thinking of, but anyway.But Lambda Extensions introduces a whole new companion API. So the runtime API is the little bit of code that allows your function to talk to the Lambda service. So when an event comes in, this is from the outside. This could be via API gateway or via the Lambda API, or where else, EventBridge or Step Functions or wherever. When you then transports that data rise in the Lambda services and HTTP call, and Lambda transposes that into an event and sends that onto the Lambda function. And it's that API that manages that. And just as a sidebar, what I find it cool on a sort of geeky, technical thing is, that actually API sits within the execution environment. People are like, "Oh, that's weird. Why would your Lambda API sit within the execution environment basically within the bubble that contains your function rather than it on the Lambda service?"And the cool answer for that is it's actually for a security mechanism. Like your function can then only ever talk to the Lambda runtime API, which is in that secure execution environment. And so our security can be a lot stronger because we know that no function code can ever talk directly out of your function into the Lambda service, it's all got to talk locally. And then the Lambda service gets that response from the runtime API and sends it back to the caller or whatever. Anyway, sidebar, thought that was nerdy and interesting. So what we've now done is we've released a new Extensions API. So the Extensions API is another API that an extension can use to get information from Lambda. And they're two different types of extensions, just briefly, internal and external extensions.Now, internal extensions run within the runtime process so that it's just basically another thread. So you can use this for Python or Java or something and say, when the Python runtime starts, let's start it with another parameter and also run this Java file that may do some observability, or logging, or tracing, or finding out how long the modules take to launch, for example. I know there's an example for Python. So that's one way of doing extensions. So it's internal extensions, they're two different flavors, but I'll send you a link. I'll provide a link to the blog posts before we go too far down the rabbit hole on that.And then the other part of extensions are external extensions. And this is a cool part because they actually run as completely separate processes, but still within that secure bubble, that secure execution environment that Lambda runs it. And this gives you some superpowers if you want. Because first of all, an extension can run in any language because it's a separate process. So if you've got a Node function, you could run an extension in other kind of languages. Now, what do we do recommend is you do run your extension in a compiled binary, just because you've got to provide the runtime that the extensions got to run in any way, so as a compiled binary, it's super easy and super useful. So is something like Go, a lot of people are doing because you write a single extension and Go, and then you can use it on your Node functions, your Java functions, your PowerShell functions, whatever. So that's a really good, simple way that you can have the portability.But now, what can these extensions do? Well, the extensions basically register with extensions API, and then they say to Lambda, "Lambda, I want to know about what happens when my functions invoke?" So the extension can start up, maybe it's got some initialization code, maybe it needs to connect to a database, or log into an observability platform, or pull down a secret order. That it can do, it's got its own init that can happen. And then it's basically ready to go before the function invokes. And then when the extension then registers and says, "I want to know when the function invokes and when it shuts down. Cool." And that's just something that registers with the API. Then what happens is, when a functioning invoke comes in, it tells the runtime API, "Hello, you now have an event," sends it off to the Lambda function, which the runtime manages, but also extension or extensions, multiple ones, hears information about that event. And so it can tell you the time it's going to run and has some metadata about that event. So it doesn't have the actual event data itself, but it's like the sort of Lambda context, a version of that that it's going to send to the extension.So the extension can use that to do various things. It can start collecting telemetry data. It can alter instrument some of your code. It could be managing a secret as a separate process that it is going to cache in the background. For example, we've got one with AppConfig, which is really cool. AppConfig is a service where you manage parameters external to your Lambda function. Well, each time your Lambda function warm invokes if you've got to do an external API call to retrieve that, well, it's going to be a little bit efficient. First of all, you're going to pay for it and it's going to take some time.So how about when the Lambda function runs and the extension could run before the Lambda function, why don't we just cache that locally? And then when your Lambda function runs, it just makes a local HTTP call to the extension to retrieve that value, which is going to be super quick. And some extensions are super clever because they're their own process. They will go, "Well, my value is for 30 minutes and every 30 minutes if I haven't been run, I will then update the value from that." So that's useful. Extensions can then also, when the runtime ... Sorry, let me back up.When the runtime is finished, it sends its response back to the runtime API, and extensions when they're done doing, so the runtime can send it back and the extension can carry on processing saying, "Oh, I've got the information about this. I know that this Lambda function has done X, Y, Z, so let me do, do some telemetry. Let me maybe, if I'm writing logs, I could write a log to S3 or to Kinesis or whatever. Do some kind of thing after the actual function invocation has happened." And then when it says it's ready, it says, "Hello, extensions API, I'm telling you I'm done." And then it's gone. And then Lambda freezes the execution environment, including the runtime and the extensions until another invocation happens. And the cycle then will happen.And then the last little bit that happens is, instead of an invoke coming in, we've extended the Lambda life cycles, so when the environment is going to be shut down, the extension can receive the shutdown and actually do some stuff and say, "Okay, well, I was connected to my observer HTTP platform, so let me close that connection. I've got some extra logs to flush out. I've got whatever else I need to do," and just be able to cleanly shut down that extra process that is running in parallel to the Lambda function.Jeremy: All right.Julian: So that was a lot of words.Jeremy: That was a lot and I bet you that would be great conversation for a dinner party. Really kicks things up. Now, the good news is that, first of all, thank you for that though. I mean, that's super technical and super in-depth. And for anyone listening who ...Julian: You did ask, I did warn you.Jeremy ... kind of lost their way ... Yes, but something that is really important to remember is that you likely don't have to write these yourself, right? There is all those companies you mentioned earlier, all those partners, they've already done this work. They've already figured this out and they're providing you access to their tools via this, that allows you to build things.Julian: Exactly.Jeremy: So if you want to build an extension and you want to integrate your product with Lambda or so forth, then maybe go back and listen to this at half speed. But for those of you who just want to take advantage of it because of the great functionality, a lot of these companies have already done that for you.Julian: Correct. And that's the sort of easiness thing, of just adding the Lambda layer or including in a container image. And yeah, you don't have to worry any about that, but behind the scenes, there's some really cool functionality that we're literally opening up our Lambda operates and allowing you to impact when a function responds.Jeremy: All right. All right. So let me ask another, maybe an overly technical question. I have heard, and I haven't experienced this, but that when it runs the life cycle that ends the Lambda function, I've heard something like it doesn't send the information right away, right? You have to wait for that Lambda to expire or something like that?Julian: Well, yes, for now, about to change. So currently Extensions is actually in preview. And that's not because it's in Beta or anything like that, but it's because we spoke to the partners and we didn't want to dump Extensions on the world. And all the partners had to come out with their extensions on day one and then try and figure out how customers are going to use them and everything. So what we really did, which I think in this case works out really well, is we worked with the partners and said, "Well, let's release this in preview mode and then give everybody a whole bunch of months to work out what's the best use cases, how can we best use this?" And some partners have said, "Oh, amazing. We're ready to go." And some partners have said, "Ah, it wasn't quite what we thought. Maybe we're going to wait a bit, or we're going to do something differently, or we've got some cool ideas, just give us time." And so that's what this time has been.The one other thing that has happened is we've actually added some performance enhancements during it. So yes, currently during the preview, the runtime and all extensions need to finish before we give you your response back to your Lambda function. So if you're in an asynchronous mode, you don't really care, but obviously if you're in a synchronous mode behind an API, yeah, you don't really want that. But when Extensions goes GA, which isn't going to be long, then that is no longer the case. So basically what'll happen is the runtime will respond and the result goes directly back to whoever's calling that, maybe API gateway, and the extensions can carry on, partly asynchronously in the background.Jeremy: Yep. Awesome. All right. And I know that the plan is to go GA soon. I'm not sure when around when this episode comes out, that that will be, but soon, so that's good to know that that is ...Julian: And in fact, when we go GA that performance enhancement is part of the GA. So when it goes GA, then you know, it's not something else you need to wait for.Jeremy: Perfect. Okay. All right. So let's move on to another bit of, I don't know if this is extensibility of the actual product itself or more so I think extensibility of maybe the workflow that you use to deploy to Lambda and deploy your serverless applications, and that's container image support. I mean, we've discussed it a lot. I think people kind of have an idea, but just give me your quick overview of what that is to set some context here.Julian: Yeah, sure. Well, container image support in a simple sort of headline thing is to be able to build and package your functions as a container image. So you basically build a function using a Docker file. So before if you use a zip function, but a lot of people use Serverless Framework or SAM, or whatever, that's all abstracted away from you, but it's actually creating a zip file and uploading it to Lambda or S3. So with container image support, you use a Docker file to build your Lambda function. That's the headline of what's happening.Jeremy: Right. And so the idea of creating, and this is also, and again, you mentioned packaging, right? I mean, that is the big thing here. This is a packaging format. You're not actually running the container in a Lambda function.Julian: Correct. Yeah, let's maybe think, because I mean, "containers," in inverted commas again for people who are on the audio, is ...Jeremy: What does it even mean?Julian: Yeah, exactly. And can be quite an overload of terms and definitely causes some confusion. And I sort of think maybe there's sort of four things that are in the container world. One, containers is an isolation mechanism. So on Linux, this is UNC Group, seccomp, other bits and pieces that can be used to isolate processes or maybe groups of processes. And then a second one, containers as the packaging mechanism. This is what Docker really popularized and this is about taking some code and the dependencies needed to run the code, and then packaging them all out together, maybe with some metadata to describe it.And then, three is containers as also a design philosophy. This is the idea, if we can package and isolate software, it's easier to run. Maybe smaller pieces of software is easy to reason about and manage independently. So I don't want to necessarily use microservices, but there's some component of that with it. And the emphasis here is on software rather than services, and standardized tooling to simplify your ops. And then the fourth thing is containers as an ecosystem. This is where all the products, tools, know how, all the actual things to how to do containers. And I mean, these are certain useful, but I wouldn't say there're anything about the other kind of things.What is cool and worth appreciating is how maybe independent these things are. So when I spoke about containers as isolation, well, we could actually replace containers as isolation with micro VMs such as we do with Firecracker, and there's no real change in the operational properties. So one, if we think, what are we doing with containers and why? One of those is in a way ticked off with Lambda. Lambda does have secure isolation. And containers as a packaging format. I mean, you could replace it with static linking, then maybe won't really be a change, but there's less convenience. And the design philosophy, that could really be applicable if we're talking microservices, you can have instances and certainly functions, but containers are all the same kind of thing.So if we talk about the packaging of Lambda functions, it's really for people who are more familiar with containers, why does Lambda have to be different? You've got, why does Lambda to have to be a snowflake in a way that you have to manage differently? And if you are packaging dependencies, and you're doing npm or pip install, and you're used to building Docker files, well, why can't we just do that for Lambda at the same things? And we've got some other things that come with that, larger function sizes, up to 10 gig, which is enabled with some of this technology. So it's a packaging format, but on the backend, there's a whole bunch of different stuff, which has to be done to to allow this. Benefits are, use your tooling. You've got your CI/CD pipelines already for containers, well, you can use that.Jeremy: Yeah, yeah. And I actually like that idea too. And when I first heard of it, I was like, I have nothing against containers, the containers are great. But when I was thinking about it, I'm like, "Wait container? No, what's happening here? We're losing something." But I will say, like when Lambda layers came out, which was I think maybe 2019 or something like that, maybe 2018, the idea of it made a lot of sense, being able to kind of supplement, add additional dependencies or code or whatever. But it always just seemed awkward. And some of the publishing for it was a little bit awkward. The versioning used like a numbered versioning instead of like semantic versioning and things like that. And then you had to share it to multiple places and if you published it as a SAR app, then you got global distri ... Anyways, it was a little bit hard to use.And so when you're trying to package large dependencies and put those in a layer and then combine them with a Lambda function, the other problem you had was you still had a maximum size that you could use for those, when those were combined. So I like this idea of saying like, "Look, I'd like to just kind of create this little isolate," like you said, "put my dependencies in there." Whether that's PyCharm or some other thing that is a big dependency that maybe I don't want to install, directly in a Lambda layer, or I don't want to do directly in my Lambda function. But you do that together and then that whole process just is a lot easier. And then you can actually run those containers, you could run those locally and test those if you wanted to.Julian: Correct. So that's also one of the sort of superpowers of this. And that's when I was talking about, just being able to package them up. Well, that now enables a whole bunch of extra kind of stuff. So yes, first of all is you can then use those container images that you've created as your local testing. And I know, it's silly for anyone to poo poo local testing. And we do like to say, "Well, bring your testing to the cloud rather than bringing the cloud to your testing." But testing locally for unit tests is super great. It's going to be super fast. You can iterate, have your Lambda functions, but we don't want to be mocking all of DynamoDB, all of building harebrained S3 options locally.But the cool thing is you've got the same Docker file that you're going to run in Lambda can be the same Docker file to build your function that you run locally. And it is literally exactly the same Lambda function that's going to run. And yes, that may be locally, but, with a bit of a stretch of kind of stuff, you could also run those Lambda functions elsewhere. So even if you need to run it on EC2 instances or ECS or Fargate or some kind of thing, this gives you a lot more opportunities to be able to use the same Lambda function, maybe in different way, shapes or forms, even if is on-prem. Now, obviously you can't recreate all of Lambda because that's connected to IM and it's got huge availability, and scalability, and latency and all that kind of things, but you can actually run a Lambda function in a lot more places.Jeremy: Yeah. Which is interesting. And then the other thing I had mentioned earlier was the size. So now the size of these container or these packages can be much, much bigger.Julian: Yeah, up to 10 gig. So the serverless purists in the back are shouting, "What about cold starts? What about cold starts?"Jeremy: That was my next question, yes.Julian: Yeah. I mean, back on zip functional archives are also all available, nothing changes with that Lambda layers, many people use and love, that's all available. This isn't a replacement it's just a new way of doing it. So now we've got Lambda functions that can be up to 10 gig in size and surely, surely that's got to be insane for cold starts. But actually, part of what I was talking about earlier of some of the work we've done on the backend to support this is to be able to support these super large package sizes. And the high level thing is that we actually cache those things really close to where the Lambda layer is going to be run.Now, if you run the Docker ecosystem, you build your Docker files based on base images, and so this needs to be Linux. One of the super things with the container image support is you don't have to use Amazon Linux or Amazon Linux 2 for Lambda functions, you can actually now build your Lambda functions also on Ubuntu, DBN or Alpine or whatever else. And so that also gives you a lot more functionality and flexibility. You can use the same Linux distribution, maybe across your entire suite, be it on-prem or anywhere else.Jeremy: Right. Right.Julian: And the two little components, there's an interface client, what you install, it's just another Docker layer. And that's that runtime API shim that talks to the runtime API. And then there's a runtime interface emulator and that's the thing that pretends to be Lambda, so you can shunt those events between HTTP and JSON. And that's the thing you would use to run locally. So runtime interface client means you can use any Linux distribution at the runtime interface client and you're compatible with Lambda, and then the interface emulators, what you would use for local testing, or if you want to spread your wings and run your Lambda functions elsewhere.Jeremy: Right. Awesome. Okay. So the other thing I think that container support does, I think it opens up a broader set of, or I guess a larger audience of people who are familiar with containerization and how that works, bringing those two Lambda functions. And one of the things that you really don't get when you run a container, I guess, on EC2, or, not EC2, I'm sorry, ECS, or Fargate or something like that, without kind of adding another layer on top of it, is the eventing aspect of it. I mean, Lambda just is naturally an event driven, a compute layer, right? And so, eventing and this idea of event driven applications and so forth has just become much more popular and I think much more mainstream. So what are your thoughts? What are you seeing in terms of, especially working with so many customers and businesses that are using this now, how are you seeing this sort of evolution or adoption of event driven applications?Julian: Yeah. I mean, it's quite funny to think that actually the event of an application was the genesis of Lambda rather than it being Serverless. I mentioned earlier about starting with S3. Yeah, the whole crux of Lambda has been, I respond to an event of an API gateway, or something on SQS, or via the API or anything. And so the whole point in a way of Lambda has been this event driven computing, which I think people are starting to sort of understand in a bigger thing than, "Oh, this is just the way you have to do Lambda." Because, I do think that serverless has a unique challenge where there is a new conceptual learning maybe that you have to go through. And one other thing that holds back service development is, people are used to a client's server and maybe ports and sockets. And even if you're doing containers or on-prem, or EC2, you're talking IP addresses and load balances, and sockets and firewalls, and all this kind of thing.But ultimately, when we're building these applications that are going to be composed of multiple services talking together through using APIs and events, the events is actually going to be a super part of it. And I know he is, not for so much longer, but my ultimate boss, but I can blame Jeff Bezos just a little bit, because he did say that, "If you want to talk via anything, talk via an API." And he was 100% right and that was great. But now we're sort of evolving that it doesn't just have to be an API and it doesn't have to be something behind API gateway or some API that you can run. And you can use the sort of power of events, particularly in an asynchronous model to not just be "forced" again in inverted commas to use APIs, but have far more flexibility of how data and information is going to flow through, maybe not just your application, but your suite of applications, or to and from your partners, or where that is.And ultimately authentications are going to be distributed, and maybe that is connecting to partners, that could be SaaS partners, or it's going to be an on-prem component, or maybe things in other kind of places. And those things need to communicate. And so the way of thinking about events is a super powerful way of thinking about that.Jeremy: Right. And it's not necessarily new. I mean, we've been doing web hooks for quite some time. And that idea of, something is going to happen somewhere and I want to be notified of it, is again, not a new concept. But I think certainly the way that it's evolved with Lambda and the way that other FaaS products had done eventing and things like that, is just those tight integrations and just all of the, I guess, the connective tissue that runs between those things to make sure that the events get delivered, and that you can DLQ them, and you can do all these other things with retries and stuff like that, is pretty powerful.I know you have, I actually just mentioned this on the last episode, about one of my favorite books, I think that changed my thinking and really got me thinking about how microservices communicate with one another. And that was Building Microservices by Sam Newman, which I actually said was sort of like my Bible for a couple of years, yes, I use that. So what are some of the other, like I know you have a favorite book on this.Julian: Well, that Building Microservices, Sam Newman, and I think there's a part two. I think it's part two, or there's another one ...Jeremy: Hopefully.Julian: ... in the works. I think even on O'Riley's website, you can go and see some preview copies of it. I actually haven't seen that. But yeah, I mean that is a great kind of Bible talking. And sometimes we do conflate this microservices things with a whole bunch of stuff, but if you are talking events, you're talking about separating things. But yeah, the book recommendation I have is one called Flow Architectures by James Urquhart. And James Urquhart actually works with VMware, but he's written this book which is looking sort of at the current state and also looking into the future about how does information flow through our applications and between companies and all this kind of thing.And he goes into some of the technology. When we talk about flow, we are talking about streams and we're talking about events. So streams would be, let's maybe put some AWS words around it, so streams would be something like Kinesis and events would be something like EventBridge, and topics would be SNS, and SQS would be queues. And I know we've got all these things and I wish some clever person would create the one flow service to rule them all, but we're not there. And they've got also different properties, which are helpful for different things and I know confusingly some of them merge. But James' sort of big idea is, in the future we are going to be able to moving data around between businesses, between applications. So how can we think of that as a flow? And what does that mean for designing applications and how we handle that?And Lambda is part of it, but even more nicely, I think is even some of the native integrations where you don't have to have a Lambda function. So if you've got API gateway talking to Step Functions directly, for example, well, that's even better. I mean, you don't have any code to manage and if it's certainly any code that I've written, you probably don't want to manage it. So yeah. I mean this idea of flow, Lambda's great for doing some of this moving around. But we are even evolving to be able to flow data around our applications without having to do anything and just wire up some things in a console or in a terminal.Jeremy: Right. Well, so you mentioned, someone could build the ultimate sort of flow control system or whatever. I mean, I honestly think EventBridge is very close to that. And I actually had Mike Deck on the show. I think it was like episode five. So two years ago, whenever it was when the show came out. I mean, when EventBridge came out. And we were talking and I sort of made the joke, I'm like, so this is like serverless web hooks, essentially being able, because there was the partner integrations where partners could push events onto an event bus, which they still can do. But this has evolved, right? Because the issue was always sort of like, I would have to subscribe to web books, I'd have to build a web hook to get events from a particular company. Which was great, always worked fine, but you're still maintaining that infrastructure.So EventBridge comes along, it creates these partner integrations and now you can just push an event on that now your applications, whether it's a Lambda function or other services, you can push them to an SQS queue, you can push them into a Kinesis stream, all these different destinations. You can go ahead and pull that data in and that's just there. So you don't have to worry about maintaining that infrastructure. And then, the EventBridge team went ahead and released the destination API, I think it's called.Julian: Yeah, API destinations.Jeremy: Event API destinations, right, where now you can set up these integrations with other companies, so you don't even have to make the API call yourself anymore, but instead you get all of the retries, you get the throttling, you get all that stuff kind of built in. So I mean, it's just really, really interesting where this is going. And actually, I mean, if you want to take a second to tell people about EventBridge API destinations, what that can do, because I think that now sort of creates both sides of that equation for you.Julian: It does. And I was just thinking over there, you've done a 10 times better job at explaining API destinations than I have, so you've nailed it on the head. And packet is that kind of simple. And it is just, events land up in your EventBridge and you can just pump events to any arbitrary endpoint. So it doesn't have to be in AWS, it can be on-prem. It can be to your Raspberry PI, it can literally be anywhere. But it's not just about pumping the events over there because, okay, how do we handle failover? And how do we handle over throttling? And so this is part of the extra cool goodies that came with API destinations, is that you can, for instance, if you are sending events to some external API and you only licensed for 1,000 invocations, not invocations, that could be too Lambda-ish, but 1,000 hits on the API every minute.Jeremy: Quotas. I think we call them quotas.Julian: Quotas, something like that. That's a much better term. Thank you, Jeremy. And some sort of quota, well, you can just apply that in API destinations and it'll basically store the data in the meantime in EventBridge and fire that off to the API destination. If the API destination is in that sort of throttle and if the API destination is down, well, it's going to be able to do some exponential back-off or calm down a little bit, don't over-flood this external API. And then eventually when the API does come back, it will be able to send those events. So that does just really give you excellent power rather than maintaining all these individual API endpoints yourself, and you're not handling the availability of the endpoint API, but of whatever your code is that needs to talk to that destination.Jeremy: Right. And I don't want to oversell this to anybody, but that also ...Julian: No, keep going. Keep going.Jeremy: ... adds the capability of enhanced security, because you're not exposing those API keys to your developers or anybody else, they're all baked in and stored within, the API destinations or within an EventBridge. You have the ability, you mentioned this idea of not needing Lambda to maybe talk directly, API gateway to DynamoDB or to step function or something like that. I mean, the cool thing about this is you do have translation capabilities, or transformation capabilities in EventBridge where you can transform the event. I haven't tried this, but I'm assuming it's possible to say, get an event from Salesforce and then pipe it into Stripe or some other API that you might want to pipe it into.So I mean, just that idea of having that centralized bus that can communicate with all these different things. I mean, we're talking about distributed systems here, right? So why is it different sending an event from my microservice A to my microservice B? Why can't I send it from my microservice A to company-wise, microservice B or whatever? And being able to do that in a secure, reliable, just with all of that stuff kind of built in for you, I think it's amazing. So I love EventBridge. To me EventBridge is one of those services that rivals Lambda. It's as, I guess as important as Lambda is, in this whole serverless equation.Julian: Absolutely, Jeremy. I mean, I'm just sitting here. I don't actually have to say anything. This is a brilliant interview and Jeremy, you're the expert. And you're just like laying down all of the excellent use cases. And exactly it. I mean, I like to think we've got sort of three interlinked services which do three different things, but are awesome. Lambda, we love if you need to do some processing or you need to do something that's literally your business logic. You've got EventBridge that can route data from in and out of SaaS partners to any other kind of API. And then you've got Step Functions that can do some coordination. And they all work together, but you've got three different things that really have sort of superpowers in terms of the amount of stuff you can do with it. And yes, start with them. If you land up bumping up against any kind of things that it doesn't work, well, first of all, get in touch with me, I'll work on that.But then you can maybe start thinking about, is it containers or EC2, or that kind of thing? But using literally just Lambda, Step Functions and EventBridge, okay. Yes, maybe you're going to need some queues, topics and APIs, and that kind of thing. But ...Jeremy: I was just going to say, add DynamoDB in there for some permanent state or for some data persistence. Right? Yeah. But other than that, no, I think you nailed it. Honestly, sometimes you're starting to build applications and yeah, you're right. You maybe need a queue here and there and things like that. But for the most part, no, I mean, you could build a lot with those three or four services.Julian: Yeah. Well, I mean, even think of it what you used to do before with API destinations. Maybe you drop something on a queue, you'd have Lambda pull that from a queue. You have Lambda concurrency, which would be set to five per second to then send that to an external API. If it failed going to that API, well, you've got to then dump it to Lambda destinations or to another SQS queue. You then got something ... You know, I'm going down the rabbit hole, or just put it on EventBridge ...Jeremy: You just have it magically happen.Julian: ... or we talk about removing serverless infrastructure, not normal infrastructure, and just removing even the serverless bits, which is great.Jeremy: Yeah, no. I think that's amazing. So we talked about a couple of these different services, and we talked about packaging formats and we talked about event driven applications, and all these other things. And a lot of this stuff, even though some of it may be familiar and you could probably equate it or relate it to things that developers might already know, there is still a lot of new stuff here. And I think, my biggest complaint about serverless was not about the capabilities of it, it was basically the education and the ability to get people to adopt it and understand the power behind it. So let's talk about that a little bit because ... What's that?Julian: It sounds like my job description, perfectly.Jeremy: Right. So there we go. Right, that's what you're supposed to be doing, Julian. Why aren't you doing it? No, but you are doing it. You are doing it. No, and that's why I want to talk to you about it. So you have that series on the Well-Architected Framework and we can talk about that. There's a whole bunch of really good resources on this. Obviously, you're doing videos and conferences, well, you used to be doing conferences. I think you probably still do some of those virtual ones, right? Which are not the same thing.Julian: Not quite, no.Jeremy: I mean, it was fun seeing you in Cardiff and where else were you?Julian: Yeah, Belfast.Jeremy: Cardiff and Northern Ireland.Julian: Yeah, exactly.Jeremy: Yeah, we were all over the place together.Julian: With the Guinness and all of us. It was brilliant.Jeremy: Right. So tell me a little bit about, sort of, the education process that you're trying to do. Or maybe even where you sort of see the state of Serverless education now, and just sort of where it's evolved, where we're getting best practices from, what's out there for people. And that's a really long question, but I don't know, maybe you can distill that down to something usable.Julian: No, that's quite right. I'm thinking back to my extensions explanation, which is a really long answer. So we're doing really long stuff, but that's fine. But I like to also bring this back to also thinking about the people aspect of IT. And we talk a lot about the technology and Lambda is amazing and S3 is amazing and all those kinds of things. But ultimately it is still sort of people lashing together these services and building the serverless applications, and deciding what you even need to do. And so the education is very much tied with, of course, having the products and features that do lots of kinds of things. And Serverless, there's always this lever, I suppose, between simplicity and functionality. And we are adding lots of knobs and levers and everything to Lambda to make it more feature-rich, but we've got to try and keep it simple at the same time.So there is sort of that trade-off, and of course with that, that obviously means not just the education side, but education about Lambda and serverless, but generally, how do I build applications? What do I do? And so you did mention the Well-Architected Framework. And so for people who don't know, this came out in 2015, and in 2017, there was a Serverless Lens which was added to it; what is basically serverless specific information for Well-Architected. And Well-Architected means bringing best practices to serverless applications. If you're building prod applications in the cloud, you're normally looking to build and operate them following best practices. And this is useful stuff throughout the software life cycle, it's not just at the end to tick a few boxes and go, "Yes, we've done that." So start early with the well-architected journey, it'll help you.And just sort of answer the question, am I well architected? And I mean, that is a bit of a fuzzy, what is that question? But the idea is to give you more confidence in the architecture and operations of your workloads, and that's not a goal it's in, but it's to reduce and minimize the impact of any issues that can happen. So what we do is we try and distill some of our questions and thoughts on how you could do things, and we built that into the Well-Architected Framework. And so the ServiceLens has a few questions on its operational excellence, security, reliability, performance, efficiency, and cost optimization. Excellent. I knew I was going to forget one of them and I didn't. So yeah, these are things like, how do you control access to an API? How do you do lifecycle management? How do you build resiliency into your application? All these kinds of things.And so the Well-Architected Framework with Serverless Lens there's a whole bunch of guidance to help you do that. And I have been slowly writing a blog series to literally cover all of the questions, they're nine questions in the Well-Architected Serverless Lens. And I'm about halfway through, and I had to pause because we have this little conference called re:Invent, which requires one or two slides to be created. But yeah, I'm desperately keen to pick that up again. And yeah, that's just providing some really and sort of more opinionated stuff, because the documentation is awesome and it's very in-depth and it's great when you need all that kind of stuff. But sometimes you want to know, well, okay, just tell me what to do or what do you think is best rather than these are the seven different options.Jeremy: Just tell me what to do.Julian: Yeah.Jeremy: I think that's a common question.Julian: Exactly. And I'll launch off from that to mention my colleague, James Beswick, he writes one or two things on serverless ...Jeremy: Yeah, I mean, every once in a while you see something from it. Yeah.Julian: ... every day. The Besbot machine of serverless. He's amazing. James, he's so knowledgeable and writes like a machine. He's brilliant. Yeah, I'm lucky to be on his team. So when you talk about education, I learn from him. But anyway, in a roundabout way, he's created this blog series and other series called the Lambda Operations Guide. And this is literally a whole in-depth study on how to operate Lambda. And it goes into a whole bunch of things, it's sort of linked to the Serverless Lens because there are a lot of common kind of stuff, but it's also a great read if you are more nerdily interested in Lambda than just firing off a function, just to read through it. It's written in an accessible way. And it has got a whole bunch of information on how to operate Lambda and some of the stuff under the scenes, how to work, just so you can understand it better.Jeremy: Right. Right. Yeah. And I think you mentioned this idea of confidence too. And I can tell you right now I've been writing serverless applications, well, let's see, what year is it? 2021. So I started in 2015, writing or building applications with Lambda. So I've been doing this for a while and I still get to a point every once in a while, where I'm trying to put something in cloud formation or I'm using the Serverless Framework or whatever, and you're trying to configure something and you think about, well, wait, how do I want to do this? Or is this the right way to do it? And you just have that moment where you're like, well, let me just search and see what other people are doing. And there are a lot of myths about serverless.There's as much good information is out there, there's a lot of bad information out there too. And that's something that is kind of hard to combat, but I think that maybe we could end it there. What are some of the things, the questions people are having, maybe some of the myths, maybe some of the concerns, what are those top ones that you think you could sort of ...Julian: Dispel.Jeremy: ... to tell people, dispel, yeah. That you could say, "Look, these are these aren't things to worry about. And again, go and read your blog post series, go and read James' blog post series, and you're going to get the right answers to these things."Julian: Yeah. I mean, there are misconceptions and some of them are just historical where people think the Lambda functions can only run for five minutes, they can run for 15 minutes. Lambda functions can also now run up to 10 gig of RAM. At re:Invent it was only 3 gig of RAM. That's a three times increase in Lambda functions within a three times proportional increase in CPU. So I like to say, if you had a CPU-intensive job that took 40 minutes and you couldn't run it on Lambda, you've now got three times the CPU. Maybe you can run it on Lambda and now because that would work. So yeah, some of those historical things that have just changed. We've got EFS for Lambda, that's some kind of thing you can't do state with Lambda. EFS and NFS isn't everybody's cup of tea, but that's certainly going to help some people out.And then the other big one is also cold starts. And this is an interesting one because, obviously we've sort of solved the cold start issue with connecting Lambda functions to VPC, so that's no longer an issue. And that's been a barrier for lots of people, for good reason, and that's now no longer the case. But the other thing for cold starts is interesting because, people do still get caught up at cold starts, but particularly for development because they create a Lambda function, they run it, that's a cold start and then update it and they run it and then go, oh, that's a cold start. And they don't sort of grok that the more you run your Lambda function the less cold starts you have, just because they're warm starts. And it's literally the number of Lambda functions that are running at exactly the same time will have a cold start, but then every subsequent Lambda function invocation for quite a while will be using a warm function.And so as it ramps up, we see, in the small percentages of cold starts that are actually going to happen. And when we're talking again about the container image support, that's got a whole bunch of complexity, which people are trying to understand. Hopefully, people are learning from this podcast about that as well. But also with the cold starts with that, those are huge and they're particular ways that you can construct your Lambda functions to really reduce those cold starts, and it's best practices anyway. But yeah, cold starts is also definitely one of those myths. And the other one ...Jeremy: Well, one note on cold starts too, just as something that I find to be interesting. I know that we, I even had to spend time battling with that earlier on, especially with VPC cold starts, that's all sort of gone away now, so much more efficient. The other thing is like provision concurrency. If you're using provision concurrency to get your cold starts down, I'm not even sure that's the right use for provision concurrency. I think provision concurrency is more just to make sure you have enough capacity because of the ramp-up time for Lambda. You certainly can use it for cold starts, but I don't think you need to, that's just my two cents on that.Julian: Yeah. No, that is true. And they're two different use cases for the same kind of thing. Yeah. As you say, Lambda is pretty scalable, but there is a bit of a ramp-up to get up to many, many, many, many thousands or tens of thousands of concurrent executions. And so yeah, using provision currency, you can get that up in advance. And yeah, some people do also use it for provision concurrency for getting those cold starts done. And yet that is another very valid use case, but it's only an issue for synchronous workloads as well. Anything that is synchronous you really shouldn't be carrying too much. Other than for cost perspective because it's going to take longer to run.Jeremy: Sure. Sure. I have a feeling that the last one you were going to mention, because this one bugs me quite a bit, is this idea of no ops or some people call it ops-less, which I think is kind of funny. But that's one of those things where, oh, it drives me nuts when I hear this.Julian: Yeah, exactly. And it's a frustrating thing. And I think often, sometimes when people are talking about no ops, they either have something to sell you. And sometimes what they're selling you is getting rid of something, which never is the case. It's not as though we develop serverless applications and we can then get rid of half of our development team, it just doesn't work like that. And it's crazy, in fact. And when I was talking about the people aspect of IT, this is a super important thing. And me coming from an infrastructure background, everybody is dying in their jobs to do more meaningful work and to do more interesting things and have the agility to try those experiments or try something else. Or do something that's better or even improve the way your build or improve the way your CI/CD pipeline runs or anything, rather than just having to do a lot of work in the lower levels.And this is what serverless really helps you do, is to be able to, we'll take over a whole lot of the ops for you, but it's not all of the ops, because in a way there's never an end to ops. Because you can always do stuff better. And it's not just the operations of deploying Lambda functions and limits and all that kind of thing. But I mean, think of observability and not knowing just about your application, but knowing about your business. Think of if you had the time that you weren't just monitoring function invocations and monitoring how long things were happening, but imagine if you were able to pull together dashboards of exactly what each transaction costs as it flows through your whole entire application. Think of the benefit of that to your business, or think of the benefit that in real-time, even if it's on Lambda function usage or something, you can say, "Well, oh, there's an immediate drop-off or pick-up in one region in the world or one particular application." You can spot that immediately. That kind of stuff, you just haven't had time to play with to actually build.But if we can take over some of the operational stuff with you and run one or two or trillions of Lambda functions in the background, just to keep this all ticking along nicely, you're always going to have an opportunity to do more ops. But I think the exciting bit is that ops is not just IT infrastructure, plumbing ops, but you can start even doing even better business ops where you can have more business visibility and more cool stuff for your business because we're not writing apps just for funsies.Jeremy: Right. Right. And I think that's probably maybe a good way to describe serverless, is it allows you to focus on more meaningful work and more meaningful tasks maybe. Or maybe not more meaningful, but more impactful on the business. Anyways, Julian, listen, this was a great conversation. I appreciate it. I appreciate the work that you're doing over at AWS ...Julian: Thank you.Jeremy: ... and the stuff that you're doing. And I hope that there will be a conference soon that we will be able to attend together ...Julian: I hope so too.Jeremy: ... maybe grab a drink. So if people want to get a hold of you or find out more about serverless and what AWS is doing with that, how do they do that?Julian: Yeah, absolutely. Well, please get hold of me anytime on Twitter, is the easiest way probably, julian_wood. Happy to answer your question about anything Serverless or Lambda. And if I don't know the answer, I'll always ask Jeremy, so you're covered twice over there. And then, three different things. James is, if you're talking specifically Lambda, James Beswick's operations guide, have a look at that. Just so much nuggets of super information. We've got another thing we did just sort of jump around, you were talking about cloud formation and the spark was going off in my head. We have something which we're calling the Serverless Patterns Collection, and this is really super cool. We didn't quite get to talk about it, but if you're building applications using SAM or serverless application model, or using the CDK, so either way, we've got a whole bunch of patterns which you can grab.So if you're pulling something from S3 to Lambda, or from Lambda to EventBridge, or SNS to SQS with a filter, all these kind of things, they're literally copy and paste patterns that you can put immediately into your cloud formation or your CDK templates. So when you are down the rabbit hole of Hacker News or Reddit or Stack Overflow, this is another resource that you can use to copy and paste. So go for that. And that's all hosted on our cool site called serverlessland.com. So that's serverlessland.com and that's an aggregation site that we run because we've got video talks, and we've got blog posts, and we've got learning path series, and we've got a whole bunch of stuff. Personally, I've got a learning path series coming out shortly on Lambda extensions and also one on Lambda observability. There's one coming out shortly on container image supports. And our team is talking all over as many things as we can virtually. I'm actually speaking about container images of DockerCon, which is coming up, which is exciting.And yeah, so serverlessland.com, that's got a whole bunch of information. That's just an easy one-stop-shop where you can get as much information about AWS services as you can. And if not yet, get in touch, I'm happy to help. I'm happy to also carry your feedback. And yeah, at the moment, just inside, we're sort of doing our planning for the next cycle of what Lambda and what all the service stuff we're going to do. So if you've got an awesome idea, please send it on. And I'm sure you'll be super excited when something pops out in the near issue, maybe just in future for a cool new functionality you could have been involved in.Jeremy: Well, I know that serverlessland.com is an excellent resource, and it's not that the AWS Compute blog is hard to parse through or anything, but serverlessland.com is certainly a much easier resource to get there. S
About Gojko AdzicGojko Adzic is a partner at Neuri Consulting LLP. He one of the 2019 AWS Serverless Heroes, the winner of the 2016 European Software Testing Outstanding Achievement Award, and the 2011 Most Influential Agile Testing Professional Award. Gojko’s book Specification by Example won the Jolt Award for the best book of 2012, and his blog won the UK Agile Award for the best online publication in 2010.Gojko is a frequent speaker at software development conferences and one of the authors of MindMup and Narakeet.As a consultant, Gojko has helped companies around the world improve their software delivery, from some of the largest financial institutions to small innovative startups. Gojko specializes in agile and lean quality improvement, in particular impact mapping, agile testing, specification by example, and behavior driven development.Twitter: @gojkoadzicNarakeet: https://www.narakeet.comPersonal website: https://gojko.netWatch this video on YouTube: https://youtu.be/kCDDli7uzn8This episode is sponsored by CBT Nuggets: https://www.cbtnuggets.com/TranscriptJeremy: Hi everyone, I'm Jeremy Daly and this is Serverless Chats. Today my guest is Gojko Adzic. Hey Gojko, thanks for joining me.Gojko: Hey, thanks for inviting me.Jeremy: You are a partner at Neuri Consulting, you're an AWS Serverless Hero, you've written I think, what? I think 6,842 books or something like that about technology and serverless and all that kind of stuff. I'd love it if you could tell listeners a little bit about your background and what you've been working on lately.Gojko: I'm a developer. I started developing software when I was six and a half. My dad bought a Commodore 64 and I think my mom would have kicked him out of the house if he told her that he bought it for himself, so it was officially for me.Jeremy: Nice.Gojko: And I was the only kid in the neighborhood that had a computer, but didn't have any ways of loading games on it because he didn't buy it for games. I stayed up and copied and pasted PEEKs and POKEs in a book I couldn't even understand until I made the computer make weird sounds and print rubbish on the screen. And that's my background. Basically, ever since, I only wanted to build software really. I didn't have any other hobbies or anything like that. Currently, I'm building a product for helping tech people who are not video editing professionals create videos very easily. Previously, I've done a lot of work around consulting. I've built a lot of product that is used by millions of school children worldwide collaborate and brainstorm through mind-mapping. And since 2016, most of my development work has been on Lambda and on team stuff.Jeremy: That's awesome. I joke a little bit about the number of books that you wrote, but the ones that you have, one of them's called Running Serverless. I think that was maybe two years ago. That is an excellent book for people getting started with serverless. And then, one of my probably favorite books is Humans Vs Computers. I just love that collection of tales of all these things where humans just build really bad interfaces into software and just things go terribly.Gojko: Thank you very much. I enjoyed writing that book a lot. One of my passions is finding edge cases. I think people with a slight OCD like to find edge cases and in order to be a good developer, I think somebody really needs to have that kind of intent, and really look for edge cases everywhere. And I think collecting these things was my idea to help people first of all think about building better software, and to realize that stuff we might glance over like, nobody's ever going to do this, actually might cause hundreds of millions of dollars of damage ten years later. And thanks very much for liking the book.Jeremy: If people haven't read that book, I don't know, when did that come out? Maybe 2016? 2015?Gojko: Yeah, five or six years ago, I think.Jeremy: Yeah. It's still completely relevant now though and there's just so many great examples in there, and I don't want to spent the whole time talking about that book, but if you haven't read it, go check it out because it's these crazy things like police officers entering in no plates whenever they're giving parking tickets. And then, when somebody actually gets that, ends up with thousands of parking tickets, and it's just crazy stuff like that. Or, not using the middle initial or something like that for the name, or the birthdate or whatever it was, and people constantly getting just ... It's a fascinating book. Definitely check that out.But speaking of edge cases and just all this experience that you have just dealing with this idea of, I guess finding the problems with software. Or maybe even better, I guess a good way to put it is finding the limitations that we build into software mostly unknowingly. We do this unknowingly. And you and I were having a conversation the other day and we were talking about way, way back in the 1970s. I was born in the late '70s. I'm old but hopefully not that old. But way back then, time-sharing was a thing where we would basically have just a few large computers and we would have to borrow time against them. And there's a parallel there to what we were doing back then and I think what we're doing now with cloud computing. What are your thoughts on that?Gojko: Yeah, I think absolutely. We are I think going in a slightly cyclic way here. Maybe not cyclic, maybe spirals. We came to the same horizontal position but vertically, we're slightly better than we were. Again, I didn't start working then. I'm like you, I was born in late '70s. I wasn't there when people were doing punch cards and massive mainframes and time-sharing. My first experience came from home PC computers and later PCs. The whole serverless thing, people were disparaging about that when the marketing buzzword came around. I don't remember exactly when serverless became serverless because we were talking about microservices and Lambda was a way to run microservices and execute code on demand. And all of a sudden, I think the JAWS people realized that JAWS is a horrible marketing name, and decided to rename it to serverless. I think it most important, and it was probably 2017 or something like that. 2000 ...Jeremy: Something like that, yeah.Gojko: Something like that. And then, because it is a horrible marketing name, but it's catchy, it caught on and then people were complaining how it's not serverless, it's just somebody else's servers. And I think there's some truth to that, but actually, it's not even somebody else's servers. It really is somebody else's mainframe in a sense. You know in the '70s and early '80s, before the PC revolution, if you wanted to be a small software house or a small product operator, you probably were not running your own data center. What you would do is you would rent it based on paying for time to one of these massive, massive, massive operators. And in fact, we ended up with AWS being a massive data center. As far as you and I are concerned, it's just a blob. It's not a collection of computers, it's a data center we learn something from and Google is another one and then Microsoft is another one.And I remember reading a book about Andy Grove who was the CEO of Intel where they were thinking about the market for PC computers in the late '70s when somebody came to them with the idea that they could repurpose what became a 8080 processor. They were doing this I think for some Japanese calculator and then somebody said, "We can attach a screen to this and make this a universal computer and sell it." And they realized maybe there's a market for four or five computers in the world like that. And I think that that's ... You know, we ended up with four or five computers, it's just the definition of a computer changed.Jeremy: Right. I think that's a good point because you think about after the PC revolution, once the web started becoming really big, people started building data centers and collocation facilities like crazy. This is way before the cloud, and everybody was buying racks and Dell was getting really popular because people buying servers from Dell, and installing these in their data centers and doing this. And it just became this massive, whole industry built around doing that. And then you have these few companies that say, "Well, what if we just handled all that stuff for you? Rather than just racking stuff for you," but started just managing the software, and started managing the networking, and the backups, and all this stuff for you? And that's where the cloud was born.But I think you make a really good point where the cloud, whatever it is, Amazon or Google or whatever, you might as well just assume that that's just one big piece of processing that you're renting and you're renting some piece of that. And maybe we have. Maybe we've moved back to this idea where ... Even though everybody's got a massive computer in their pocket now, tons of compute power, in terms of the real business work that's being done, and the real global value, and the things that are powering global commerce and everything else like that, those are starting to move back to run in four, five, massive computers.Gojko: Again, there's a cyclic nature to all of this. I remember reading about the advent of power networks. Because before people had electric power, there were physical machines and movement through physical power, and there were water-powered plants and things like that. And these whole systems of shafts and belts and things like that powering factories. And you had this one kind of power load in a factory that was somewhere in the middle, and then from there, you actually have physical belts, rotating cogs in other buildings, and that was rotating some shafts that were rotating other cogs, and things like that.First of all, when people were able to package up electricity into something that's distributable, and they were running their own small electricity generators next to these big massive machines that were affecting early factories. And one of the first effects of that was they could reuse 30% of their factories better because it was up to 30% of the workspace in the factory that was taken up by all the belts and shafts. And all that movement was producing a lot of air movement and a lot of dust and people were getting sick. But now, you just plug a cable and you no longer have all this bad air and you don't have employees going sick and things like that. Things started changing quite a lot and then all of a sudden, you had this completely new revolution where you no longer had to operate your own electric generator. You could just plug in and get power from the network.And I think part of that is again, cyclic, what's happening in our industry now, where, as you said, we were getting machines. I used to make money as a Linux admin a long time ago and I could set up my own servers and things like that. I had a company in 2007 where we were operating our own gaming system, and we actually had physical servers in a physical server room with all the LEDs and lights, and bleeps, and things like that. Around that time, AWS really made it easy to get virtual machines on EC2 and I realized how stupid the whole, let's manage everything ourself is. But, we are getting to the point where people had to run their own generators, and now you can actually just plug into the electricity network. And of course, there is some standardization. Maybe U.S. still has 110 volts and Europe has 220, and we never really get global standardization there.But I assume before that, every factory could run their own voltage they wanted. It was difficult to manufacture for these things but now you have standardization, it's easier for everybody to plug into the ecosystem and then the whole ecosystem emerged. And I think that's partially what's happening now where things like S3 is an API or Lambda is an API. It's basically the electric socket in your wall.Jeremy: Right, and that's that whole Wardley maps idea, they become utilities. And that's the thing where if you look at that from an enterprise standpoint or from a small business standpoint if you're a startup right now and you are ordering servers to put into a data center somewhere unless you're doing something that's specifically for servers, that's just crazy. Use the cloud.Gojko: This product I mentioned that we built for mind mapping, there's only two of us in the whole company. We do everything from presales, to development testing supports, to everything. And we're competing with companies that have several orders of magnitude more employees, and we can actually compete and win because we can benefit from this ecosystem. And I think this is totally wonderful and amazing and for anybody thinking about starting a product, it's easier to start a product now than ever. And, another thing that's totally I think crazy about this whole serverless thing is how in effect we got a bookstore to offer that first.You mentioned the world utility. I remember I was the editor of a magazine in 2001 in Serbia, and we had licensing with IDG to translate some of their content. And I remember working on this kind of piece from I think PC World in the U.S. where they were interviewing Hewlett Packard people about utility computing. And people from Hewlett Packard back then were predicting that in a few years' time, companies would not operate their own stuff, they would use utility and things like that. And it's totally amazing that in order to reach us over there, that had to be something that was already evaluated and tested, and there was probably a prototype and things like that. And you had all these giants. Hewlett Packard in 2001 was an IT giant. Amazon was just up-and-coming then and they were a bookstore then. They were not even anything more than a bookstore. And you had, what? A decade later, the tables completely turned where HP's ... I don't know ...Jeremy: I think they bought Compaq at some point too.Gojko: You had all these giants, IBM completely missed it. IBM totally missed ...Jeremy: It really did.Gojko: ... the whole mobile and web and everything revolution. Oracle completely missed it. They're trying to catch up now but fat chance. Really, we are down to just a couple of massive clouds, or whatever that means, that we interact with as we're interacting with electricity sockets now.Jeremy: And going back to that utility comparison, or, not really a comparison. It is a utility now. Compute is offered as a utility. Yes, you can buy and generate compute yourself and you can still do that. And I know a lot of enterprises still will. I think cloud is like 4% of the total IT market or something. It's a fraction of it right now. But just from that utility aspect of it, from your experience, you mentioned you had two people and you built, is it MindMup.com?Gojko: MindMup, yeah.Jeremy: You built that with just two people and you've got tons of people using it. But just from your experience, especially coming from the world of being a Linux administrator, which again, I didn't administer ... Well, I guess I was. I did a lot of work in data centers in my younger days. But, coming from that idea and seeing how companies were building in the past and how companies are still building now, because not every company is still using the cloud, far from it. But not taking advantage of that utility, what are those major disadvantages? How badly do you think that's going to slow companies down that are trying to innovate?Gojko: I can give you a story about MindMup. You mentioned MindMup. When was it? 2018, there was the Intel processor vulnerabilities that were discovered.Jeremy: Right, yes.Gojko: I'm not entirely sure what the year was. A few years ago anyway. We got a email from a concerned university admin when the second one was discovered. The first one made all the news and a month later a second one was discovered. Now everybody knew that, they were in panic and things like that. After the second one was discovered, we got a email from a university admin. And universities are big users, they need to protect the data and things like that. And he was insisting that we tell him what our plan was for mitigating this thing because he knows we're on the cloud.I'm working on European time. The customer was in the U.S., probably somewhere U.S. Pacific because it arrived in the middle of the night. I woke up, I'm still trying to get my head around and drinking coffee and there's this whole sausage CV number that he sent me. I have no idea what it's about. I took that, pasted it into Google to figure out what's going on. The first result I got from Google was that AWS Lambda was already patched. Copy, paste, my day's done. And I assume lots and lots of other people were having a totally different conversation with their IT department that day. And that's why I said I think for products like the one I'm building with video and for the MindMup, being able to rent operations as a utility, but really totally rent ops as a utility, not have to worry about anything below my unique business level is really, really important.And yes, we can hire people to work on that it could even end up being slightly cheaper technically but in terms of my time and where my focus goes and my interruptions, I think deploying on a utility platform, whatever that utility platform is, as long as it's reliable, lets me focus on adding value where I can actually add value. That makes my product unique rather than the generic stuff.Jeremy: You mentioned the video product that you're working on too, and something that is really interesting I think too about taking advantage of the cloud is the scalability aspect of it. I remember, it was maybe 2002, maybe 2003, I was running my own little consulting company at the time, and my local high school always has a rivalry football game every Thanksgiving. And I thought it'd be really interesting if I was to stream the audio from the local AM radio station. I set up a server in my office with ReelCast Streaming or something running or whatever it was. And I remember thinking as long as we don't go over 140 subscribers, we'll be okay. Anything over that, it'll probably crash or the bandwidth won't be enough or whatever.Gojko: And that's just one of those things now, if you're doing any type of massive processing or you need bandwidth, bandwidth alone ... I remember T1 lines being great and then all of a sudden it was like, well, now you need a T3 line or something crazy in order to get the bandwidth that you need. Just from that aspect of it, the ability to scale quickly, that just seems like such a huge blocker for companies that need to order provision servers, maybe get a utility company to come in and install more bandwidth for them, and things like that. That's just stuff that's so far out of scope for building a business to me. At least building a software business or building any business. It's crazy.When I was doing consulting, I did a bit of work for what used to be one of the largest telecom companies in the world.Jeremy: Used to be.Gojko: I don't want to name names on a public chat. Somewhere around 2006, '07 let's say, we did a software project where they just needed to deploy it internally. And it took them seven months to provision a bunch of virtual machines to deploy it internally. Seven months.Jeremy: Wow.Gojko: Because of all the red tape and all the bureaucracy and all the wait for capacity and things like that. That's around the time where Amazon when EC2 became commercially available. I remember working with another client and they were waiting for some servers to arrive so they can install more capacity. And I remember just turning on the Amazon console. I didn't have anything useful to running it then but just being able to start up a virtual machine in about, I think it was less than half an hour, but that was totally fascinating back then. Here's a new Linux machine and in less than half an hour, you can use it. And it was totally crazy. Now we're getting to the point where Lambda will start up in less than 10 milliseconds or something like that. Waiting for that kind of capacity is just insane.With the video thing I'm building, because of Corona and all of this remote teaching stuff, for some reason, we ended up getting lots of teachers using the product. It was one of these half-baked experiments because I didn't have time to build the full user interface for everything, and I realized that lots of people are using PowerPoint to prepare that kind of video. I thought well, how about if I shorten that loop, so just take your PowerPoint and convert it into video. Just type up what you want in the speaker notes, and we'll use these neurometrics to generate audio and things like that. Teachers like it for one reason or the other.We had this influential blogger from Russia explain it on his video blog and then it got picked up, my best guess from what I could see from Google Translate, some virtual meeting of teachers of Russia where they recommended people to try it out. I woke up the next day, the metrics went totally crazy because a significant portion of teachers in Russia tried my tool overnight in a short space of time. Something like that, I couldn't predict it. It's lovely but as you said, as long as we don't go over a hundred subscribers, we're fine. If I was in a situation like that, the thing would completely crash because it's unexpected. We'd have a thing that's amazingly good for marketing that would be amazingly bad for business because it would crash all our capacity we had. Or we had to prepare for a lot more capacity than we needed, but because this is all running on Lambda, Fargate, and other auto-scaling things, it's just fine. No sweat at all. It was a lovely thing to see actually.Jeremy: You actually have two problems there. If you're not running in the cloud or not running on-demand compute, is the fact that one, you would've potentially failed, things would've fallen over and you would've lost all those potential customers, and you wouldn't have been able to grow.Gojko: Plus you've lost paying customers who are using your systems, who've paid you.Jeremy: Right, that's the other thing too. But, on the other side of that problem would be you can't necessarily anticipate some of those things. What do you do? Over-provision and just hope that maybe someday you'll get whatever? That's the crazy thing where the elasticity piece of the cloud to me, is such a no-brainer. Because I know people always talk about, well, if you have predictable workloads. Well yeah, I know we have predictable workloads for some things, but if you're a startup or you're a business that has like ... Maybe you'd pick up some press. I worked for a company that we picked up some press. We had 10,000 signups in a matter of like 30 seconds and it completely killed our backend, my SQL database. Those are hard to prepare for if you're hosting your own equipment.Gojko: Absolutely, not even if you're hosting your own.Jeremy: Also true, right.Gojko: Before moving to Lambda, the app was deployed to Heroku. That was basically, you need to predict how many virtual machines you need. Yes, it's in the cloud, but if you're running on EC2 and you have your 10, 50, 100 virtual machines, whatever running there, and all of a sudden you get a lot more traffic, will it scale or will it not scale? Have you designed it to scale like that? And one of the best things that I think Lambda brought as a constraint was forcing people to design this stuff in a way that scales.Jeremy: Yes.Gojko: I can deploy stuff in the cloud and make it all distributed monolith, so it doesn't really scale well, but with Lambda because it was so constrained when it launched, and this is one thing you mentioned, partially we're losing those constraints now, but it was so constrained when it launched, it was really forcing people to design things that were easy to scale. We had total isolation, there was no way of sharing things, there was no session stickiness and things like that. And then you have to come up with actually good ways of resolving that.I think one of the most challenging things about serverless is that even a Hello World is a distributed transaction processing system, and people don't get that. They think about, well, I had this DigitalOcean five-dollar-a-month server and it was running my, you know, Rails up correctly. I'm just going to use the same ideas to redesign it in Lambda. Yes, you can, but then you're not going to really get the benefits of all of this other stuff. And if you design it as a massively distributed transaction processing system from the start, then yes, it scales like crazy. And it scales up and down and it's lovely, but as Lambda's maturing, I have this slide deck that I've been using since 2016 to talk about Lambda at conferences. And every time I need to do another talk, I pull it out and adjust it a bit. And I have this whole Git history of it because I do markdown to slides and I keep the markdown in Git so I can go back. There's this slide about limitations where originally it's only ... I don't remember what was the time limitation, but something very short.Jeremy: Five minutes originally.Gojko: Yeah, something like that and then it was no PCI compliance and the retries are difficult, and all of this stuff basically became sold. And one of the last things that was there, there was don't even try to put it in a VPC, definitely, you can but it's going to take 10 minutes to start. Now that's reasonably okay as well. One thing that I remember as a really important design constraint was effectively it was a share nothing platform because you could not share data between two Lambdas running at the same time very easily in the same VM. Now that we can connect Lambdas to EFS, you effectively can do that as well. You can have two Lambdas, one writing into an EFS, the other reading the same EFS at the same time. No problem at all. You can pump it into a file and the other thing can just stay in a file and get the data out.As the platform is maturing, I think we're losing some of these design constraints, and sometimes constraints breed creativity. And yes, you still of course can design the system to be good, but it's going to be interesting to see. And this 15-minute limit that we have in Lamdba now is just an artificial number that somebody thought.Jeremy: Yeah, it's arbitrary.Gojko: And at some point when somebody who is important enough asks AWS to give them half-hour Lamdbas, they will get that. Or 24-hour Lambdas. It's going to be interesting to see if Lambda ends up as just another way of running EC2 and starting EC2 that's simpler because you don't have to manage the operating system. And I think the big difference we'll get between EC2 and Lambda is what percentage of ops your developers are responsible for, and what percentage of ops Amazon's developers are responsible for.Because if you look at all these different offerings that Amazon has like Lightsail and EC2 and Fargate and AWS Batch and CodeDeploy, and I don't know how many other things you can run code on in Lambda. The big difference with Lambda is really, at least until very recently was that apart from your application, Amazon is responsible for everything. But now, we're losing design constraints, you can put a Docker container in, you can be responsible for the OS image as well, which is a bit again, interesting to look at.Jeremy: Well, I also wonder too, if you took all those event sources that you can point at Lambda and you add those to Fargate, what's the difference? It seems like they're just merging into two very similar products.Gojko: For the video build platform, the last step runs in Fargate because people are uploading things that are massive, massive, massive for video processing, and just they don't finish in 15 minutes. I have to run to Fargate, and the big difference is the container I packaged up for Fargate takes about 40 seconds to actually deploy. A new event at the moment with the stuff I've packaged in Fargate takes about 40 seconds to deploy. I can optimize that, but I can't optimize it too much. Fargate is still order of magnitude of tens of seconds to process an event. I think as Fargate gets faster and as Lambda gets more of these capabilities, it's going to be very difficult to tell them apart I think.With Fargate, you're intended to manage the container image yourself. You're responsible for patching software, you're responsible for patching OS vulnerabilities and things like that. With Lambda, Amazon, unless you use a container image, Amazon is responsible for that. They come close. When looking at this video building for the first time, I was actually comparing code. I was considering using CodeBuild for that because CodeBuild is also a way to run things on demand and containers, and you actually can get quite decent machines with CodeBuild. And it's also event-driven, and Fargate is event-driven, AWS Batch is event-driven, and all of these things are converging to each other. And really, AWS is famous for having 10 products that do the same thing effectively and you can't tell them apart, and maybe that's where we'll end.Jeremy: And I'm wondering too, the thing that was great about Lambda, at least for me like you said, the shared nothing architecture where it was like, you almost didn't have to rely on anything other than the event that came in, and the processing of that Lambda function. And if you designed your systems well, you may have some bottleneck up front, but especially if you used distributed transactions and you used async invocations of downstream functions, where you could basically take some data that you needed to pass into it, and then you wouldn't necessarily need that to communicate with anything other than itself to process that data. The scale there was massive. You could just keep scaling and scaling and scaling. As you add things like EFS and that adds constraints in terms of the number of transactions and connections that, that can make and all those sort of things. Do these things, do they become less reliable? By allowing it to do more, are we building systems that are less reliable because we're not using some of those tried-and-true constraints that were there?Gojko: Possibly, but every time you add a new moving part, you create one more potential point of failure there. And I think for me, one of the big lessons when I was working on ... I spent a few years working on very high throughput transaction processing systems. That's why this whole thing rings a bell a lot. A lot of it really was how do you figure out what type of messages you send and where you send them. The craze of these messages and distributed transaction processing systems in early 2000s, created this whole craze of enterprise service buses later that came. We now have this... What is it called? It's not called enterprise service bus, it's called EventBridge, or something like that.Jeremy: EventBridge, yes.Gojko: That's effectively an enterprise service bus, it's just the enterprise is the Amazon cloud. The big challenge in designing things like that is decoupling. And it's realizing that when you have a complicated system like that, stuff is going to fail. And especially when we were operating around hardware, stuff is going to fail badly or occasionally, and you need to not bring the whole house down where some storage starts working a bit slower. You create circuit breakers, you create layers and layers of stuff that disconnect things. I remember when we were looking originally at Lambdas and trying to get the head around that and experimenting, should one Lambda call another? Or should one Lambda not call another? And things like that.I realized, let's say for now, until we realize we want to do something else, a Lambda should only ever talk to SNS and nothing else. Or SQS or something like that. When one Lambda completes, it's going to track a message somewhere and we need to design these messages to be good so that we can decouple different parts of the process. And so far, that helps too as a constraint. I think very, very few times we have one Lambda calling another. Mostly when we actually need a synchronized response back, and for security reasons, we wanted to isolate something to a single Lambda, but that's effectively just a black box security isolation. Since creating these isolation layers through messages, through queues, through topics, becomes a fundamental part of designing these systems.I remember speaking at the conference to somebody. I forgot the name of the person who was talking about airline. And he was presenting after me and he said, "Look, I can relate to a lot of what you said." And in the airline community basically they often talk about, apparently, I'm not an airline programmer, he told me that in the airline community, talk about designing the protocol being the biggest challenge. Once you design the protocol between your components, the message is who sends what where, you can recover from almost any other design flaw because it's decoupled so if you've made a mess in one Lambda, you can redesign that Lambda, throw it away, rewrite it, decouple things a different way. If the global protocol is good, you get all the flexibility. If you mess up the protocol for communication, then nothing's going to save you at the end.Now we have EFS and Lambda can talk to an EFS. Should this Lambda talk directly to an EFS or should this Lambda just send some messages to a topic, and then some other Lambdas that are maybe reserved, maybe more constrained talk to EFS? And again, the platform's evolved quite a lot over the last few years. One thing that is particularly useful in that regard is the SQS FIFO queues that came out last year I think. With Corona ...Jeremy: Yeah, whenever it was.Gojko: Yeah, I don't remember if it was last year or two years ago. But one of the things it allows us to do is really run lots and lots of Lambdas in parallel where you can guarantee that no two Lambdas access the same kind of business entity that you have in the same type. For example, for this mind mapping thing, we have lots and lots of people modifying lots and lots of files in parallel, but we need to aggregate a single map. If we have 50 people over here working with a single map and 60 people on a map working a different map, aggregation can run in parallel but I never ever, ever want two people modifying the same map their aggregation to run in parallel.And for Lambda, that was a massive challenge. You had to put Kinesis between Lambda and other Lambdas and things like that. Kinesis' provision capacity, it costs a lot, it doesn't auto-scale. But now with SQS FIFO queues, you can just send a message and you can say the kind of FIFO ID is this map ID that we have. Which means that SQS can run thousands of Lambdas in parallel but they'll never run more than one Lambda for the same map idea at the same time. Designing your protocols like that becomes how you decouple one end of your app that's massively scalable and massively parallel, and another end of your app that we have some reserved capacity or limits.Like for this kind of video thing, the original idea of that was letting me build marketing videos easier and I can't get rid of this accent. Unfortunately, everything I do sounds like I'm threatening someone to blackmail them. I'm like a cheap Bond villain, and that's not good, but I can't do anything else. I can pay other people to do it for me and we used to do that, but then that becomes a big problem when you want to modify tiny things. We paid this lady to professionally record audio for a marketing video that we needed and then six months later, we wanted to change one screen and now the narration is incorrect. And we paid the same woman again. Same equipment, same person, but the sound is totally different because two different equipment.Jeremy: Totally different, right.Gojko: You can't just stitch it up. Then you end up like, okay, do we go and pay for the whole thing again? And I realized the neurometric text-to-speech has learned so much that it can do English better than I can. You're a native English speaker so you can probably defeat those machines, but I can't.Jeremy: I don't know if I could. They're pretty good now. It's kind of scary.Gojko: I started looking at one like why don't they just put stuff in a Markdown and use Markdown to generate videos and things like that? All of these things, you get quota limits still. I thought we were limited on Google. Google gave us something like five requests per second in parallel, and it took me a really long time to even raise these quotas and things like that. I don't want to have lots of people requesting stuff and then in parallel trashing this other thing over there. We need to create these layers of running things in a decent limit, and I think that's where I think designing the protocol for this distributed system becomes an importance.Jeremy: I want to go back because I think you bring up a really good point just about a different type of architecture, or the architectural design of decoupling systems and these event-driven things. You mentioned a Lambda function processes something and sends it to SQS or sends it to SNS to it can do a fan-out pattern or in the case of the FIFO queue, doing an ordered pattern for sequential processing, which those were all great patterns. And even things that AWS has done, such as add things like Lambda destination. Now if you run an asynchronous Lambda function, you still have to write some code or you used to have to write some code that said, "When this is finished processing, now call some other component." And there's just another opportunity for failure there. They basically said, "Well, if it succeeds, then you can actually just forward it off to one of these other services automatically and we'll handle all of the retries and all the failures and that kind of stuff."And those things have been added in to basically give you that warm and fuzzy feeling that if an event doesn't reach where it's supposed to go, that some sort of cloud trickery will kick in and make sure that gets processed. But what that is introduced I think is a cognitive overload for a lot of developers that are designing these systems because you're no longer just writing a script that does X, Y, and Z and makes a few database calls. Now you're saying, okay, I've got to write a script that can massively scale and take the transactions that I need to maybe parallelize or that I maybe need to queue or delay or throttle or whatever, and pass those down to another subsystem. And then that subsystem has to pick those up and maybe that has to parallelize those or maybe there are failure modes in there and I've got all these other things that I have to think about.Just that effect on your average developer, I think you and I think about these things. I would consider myself to be a cloud architect, if that's a thing. But essentially, do you see this being I guess a wall for a lot of developers and something that really requires quite a bit of education to ramp them up to be able to start designing these systems?Gojko: One of the topics we touched upon is the cyclic nature of things, and I think we're going back to where moving from apps working on a single machine to client server architectures was a massive brain melt for a lot of people, and three-tier architectures, which is later, we're not just client server, but three-tier architectures ended up with their own host of problems and then design problems and things like that. That's where a lot of these architectural patterns and design patterns emerged like circuit breakers and things like that. I think there's a whole body of knowledge there for people to research. It's not something that's entirely new and I think you can get started with Lambda quite easily and not necessarily make a mess, but make something that won't necessarily scale well and then start improving it later.That's why I was mentioning that earlier in the discussion where, as long as the protocol makes sense, you can salvage almost anything late. Designing that protocol is important, but then we're going to good software design. I think teaching people how to do that is something that every 10 years, we have to recycle and reinvent and figure it out because people don't like to read books from more than 10 years ago. All of this stuff like designing fault tolerance systems and fail-safe systems, and things like that. There's a ton of books about that from 20 years ago, from 10 years ago. Amazon, for people listening to you and me, they probably use Amazon more for compute than they use for getting books. But Amazon has all these books. Use it for what Amazon was originally intended for and then get some books there and read through this stuff. And I think looking at design of distributed systems and stuff like that becomes really, really critical for Lambdas.Jeremy: Yeah, definitely. All right, we've got a few minutes left and I'd love to go back to something we were talking about a little bit earlier and that was everything moving onto a few of these major cloud providers. And one of the things, you've got scale. Scale is a problem when we talked about oh, we can spin up as many VMs as we want to, and now with serverless, we have unlimited capacity really. I know we didn't say that, but I think that's the general idea. The cloud just provides this unlimited capacity.Gojko: Until something else decides it's not unlimited.Jeremy: And that's my point here where every major cloud provider that I've been involved with and I've heard the stories of, where you start to move the needle at all, there's always an SA that reaches out to you and really wants to understand what your usage is going to be, and what your patterns were going to be. And that's because they need to make sure that where you're running your applications, that they provision enough capacity because there is not enough capacity, or there's not unlimited capacity in the cloud.Gojko: It's physically limited. There's only so much buildings where you can have data centers on the surface of Earth.Jeremy: And I guess that's where my question comes in because you always hear these things about lock-in. Like, well serverless, if you use Lambda, you're going to be locked in. And again, if you're using Oracle, you're locked in. Or, you're using MySQL you're locked in. Or, you're using any of the other things, you're locked in.Gojko: You're actually not locked in physically. There's a key and a lock.Jeremy: Right, but this idea of being locked in not to a specific cloud provider, but just locked into a cloud in general and relying on the cloud to do that scaling for you, where do you think the limitations there are?Gojko: I think again, going back to cyclic, cyclic, cyclic. The PC revolution started when a lot more edge compute was needed on mainframes, and people wanted to get stuff done on their own devices. And I think probably, if we do ever see the limitations of this and it goes into a next cycle, my best guess it's going to be driven by lots of tiny devices connected to a cloud. Not necessarily computers as we know computers today. I pulled out some research preparing for this from IDC. They are predicting basically from 18.3 zettabytes of data needed for IOT in 2019, to be 73.1 zettabytes by 2025. That's like times three in a space of six years. If you went to Amazon now and told them, "You need to have three times more data space in three years," I'm not sure how they would react to that.This stuff, everything is taking more and more data, and everything is more and more connected to the cloud. The impact of something like that going down now is becoming totally crazy. There was a case in 2017 where S3 started getting a bit more latency than usual in U.S. East 1, in I think February of 2018, or something like that. There were cases where people couldn't turn the lights on in their houses because the management software was working on S3 and depending on S3. Expecting S3 to be indestructible. Last year, in November, Kinesis pretty much went offline as far as everybody else outside AWS concerend for about 15 hours I think. There were people on Twitter that they can't go back into their house because their smart lock is no longer that smart.And I think we are getting to places where there will be more need for compute on the edge. First of all, there's going to be a lot more demand for data centers and cloud power and I think that's going to keep going on for the next five, ten years. But then people will realize they've hit some limitation of that, and they're going to start moving towards the edge. And we're going from mainframe back into client server computing I think. We're getting these products now. I assume most of your listeners have seen one like all these fancy ubiquity Wi-Fi thingies that are costing hundreds of dollars and they look like pieces of furniture that's just sitting discretely on the wall. And there was a massive security breach yesterday published. Somebody took their AWS keys and took all the customer data and everything.The big advantage over all the ugly routers was that it's just like a thin piece of glass that sits on your wall, and it's amazing and it looks good, but the reason why they could do a very thin piece of glass is the minimal amount of software is running on that piece of glass, the rest is running the cloud. It's not just locking in terms of is it on Amazon or Google, it's that it's so tightly coupled with something totally outside of your home, where your network router needs Amazon to be alive now in a very specific region of Amazon where everybody's been deploying for the last 15 years, and it's running out of capacity very often. Not very often but often enough.There's some really interesting questions that I guess we'll answer in the next five, ten years. We're on the verge of IOT I think exploding because people are trying to come up with these new products that you wouldn't even think before that you'd have smart shoes and smart whatnot. Smart glasses and things like that. And when that gets into consumer technology, we're no longer going to have five or ten computer devices per person, we'll have dozens and dozens of computing. I guess think about it this way, fifteen years ago, how many computer devices were you carrying with yourself? Probably mobile phone and laptop. Probably not more. Now, in the headphones you have there that's Bose ...Jeremy: Watch.Gojko: ... you have a microprocessor in the headphones, you have your watch, you have a ton of other stuff carrying with you that's low-powered, all doing a bit of processing there. A lot of that processing is probably happening on the cloud somewhere.Jeremy: Or, it's just sending data. It's just sending, hey here's the information. And you're right. For me, I got my Apple Watch, my thermostat is connected to Wi-Fi and to the cloud, my wife just bought a humidifier for our living room that is connected to Wi-Fi and I'm assuming it's sending data to the cloud. I'm not 100% sure, but the question is, I don't know why we need to keep track of the humidity in my living room. But that's the kind of thing too where, you mentioned from a security standpoint, I have a bunch of AWS access keys on my computer that I send over the network, and I'm assuming they're secure. But if I've got another device that can access my network and somebody hacked something on the cloud side and then they can get in, it gets really dangerous.But you're right, the amount of data that we are now generating and compute that we're using in the cloud for probably some really dumb things like humidity in my living room, is that going to get to a point where... You said there's going to be a limitation like five years, ten years, whatever it is. What does the cloud do then? What does the cloud do when it can no longer keep up with the pace of these IOT devices?Gojko: Well, if history is repeating and we'll see if history is repeating, people will start getting throttled and all of a sudden, your unlimited supply of Lambdas will no longer be unlimited supply of Lambdas. It will be something that you have to reserve upfront and pay upfront, and who knows, we'll see when we get there. Or, we get things that we have with power networks like you had a Texas power cut there that was completely severe, and you get a IT cut. I don't know. We'll see. The more we go into utility, the more we'll start seeing parallels between compute and power networks. And maybe power networks are something that you can look at and later name. That's why I think the next cycle is probably going to be some equivalent of client server computing reemerging.Jeremy: Yeah. All right, well, I got one more question for you and this is just something where it may be a little bit of a tongue-in-cheek question. Because we talked it a little bit ... we talked about the merging of Lambda, and of Fargate, and some of these other things. But just from your perspective, serverless in five years from now, where do you see that going? Do you see that just becoming the main ... This idea of utility computing, on-demand computing without setting up servers and managing ops and some of these other things, do you see that as the future of serverless and it just becoming just the way we build applications? Or do you think that it's got a different path?Gojko: There was a tweet by Simon Wardley. You mentioned Simon Wardley earlier in the talk. There was a tweet a few days ago where he mentioned some data. I'm not sure where he pulled it from. This might be unverified, but generally Simon knows what he's talking about. Amazon itself is deploying roughly 50% of all new apps they're building on serverless. I think five years from now, that way of running stuff, I'm not sure if it's Lambda or some new service that Amazon starts and gives it some even more confusing name that runs in parallel to everything. But, that kind of stuff where the operator takes care of all the ops, which they really should be doing, is going to be the default way of getting utility compute out.I think a lot of these other things will probably remain useful for specialists' use cases, where you can't really deploy it in that way, or you need more stability, or it's not transient and things like that. My best guess is first of all, we'll get Lambda's that run for longer, and I assume that after we get Lambdas that run for longer, we'll probably get some ways of controlling routing to Lambdas because you already can set up pre-provisioned Lambdas and hot Lambdas and reserved capacity and things like that. When you have reserved capacity and you have longer running Lambdas, the next logical thing there is to have session stickiness, and routing, and things like that. And I think we'll get a lot of the stuff that was really complicated to do earlier, and you had to run EC2 instances or you had to run complicated networks of services, you'll be able to do in Lambda.And Lambda is, I wouldn't be surprised if they launch a totally new service with some AWS cloud socket, whatever. Something that is a implementation of the same principle, just in a different way, that becomes a default we are running computer for lots of people. And I think GPUs are still a bit limited. I don't think you can run GPU utility anywhere now, and that's limiting for a whole host of use cases. And I think again, it's not like they don't have the technology to do it, it's just they probably didn't get around to doing it yet. But I assume in five years time, you'll be able to do GPUs on-demand, and processing GPUs, and things like that. I think that the buzzword itself will lose really any special meaning and that's going to just be a way of running stuff.Jeremy: Yeah, absolutely. Totally agree. Well, listen Gojko, thank you so much for spending the time chatting with me. Always great to talk with you.Gojko: You, too.Jeremy: If people want to get in touch with you, find out more about what you're doing, how do they do that?Gojko: Well, I'm very easy to find online because there's not a lot of people called Gojko. Type Gojko into Google, you'll find me. And gojko.networks, gojko.com works, gojko.org works, and all these other things. I was lucky enough to get all those domains.Jeremy: That's G-O-J-K-O ...Gojko: Yes, G-O-J-K-O.Jeremy: ... for people who need the spelling.Gojko: Excellent. Well, thanks very much for having me, this was a blast.Jeremy: All right, yeah. And make sure you check out ... You mentioned Narakeet. It's a speech thing?Gojko: Yeah, for developers that want to build videos without hassle, and want to put videos in continuous integration, and things like that. Narakeet, that's like parakeet with an N for narration. Check that out and thanks for plugging it.Jeremy: Awesome. And then, check out MindMup as well. Awesome stuff. I've got all the stuff in the show notes. Thanks again, Gojko.Gojko: Thank you. Bye-bye.
Special guest: Calvin Hendryx-Parker Live stream Watch on YouTube Michael #1: AWSimple by James Abel AWSimple is a more object oriented interface on top of boto3 for some of the common “serverless” AWS services: S3, DynamoDB, SNS, and SQS. Features: Simple Object Oriented API on top of boto3 One-line S3 file write, read, and delete Automatic S3 retries Locally cached S3 accesses True file hashing (SHA512) for S3 files (S3's etag is not a true file hash) DynamoDB full table scans (with local cache option) DynamoDB secondary indexes Built-in pagination (e.g. for DynamoDB table scans and queries). Always get everything you asked for. Can automatically set SQS timeouts based on runtime data (can also be user-specified) Caching: S3 objects and DynamoDB tables can be cached locally to reduce network traffic, minimize AWS costs, and potentially offer a speedup. Brian #2: coverage and installed packages I’ve covered coverage.py a lot on Test & Code, starting with episode 12, and even talked about it on episode 147, and many others. Except there’s something I missed, hidden in plain sight, all this time. coverage --source , as well as pytest --cov if using pytest-cov plugin, is not just a path. “You can specify source to measure with the --source command-line switch, or the [run] source configuration value. The value is a comma- or newline-separated list of directories *or package names*. If specified, only source inside these directories or packages will be measured.” - coverage.py docs, (emphasis mine) Up to now I was doing this trick I picked up from I don’t remember where I would run coverage from the top level project directory, specify the source as the project source, and set a [paths] setting in .coveragerc, the source setting to both the project source and the site-packages directory. Then the report would show the coverage of the source code, even though it was the site-packages code that was running. That trick is still nice to specify the output as your project directory, which is usually a shorter relative path. However, it’s not essential. You can just specify the source as the package name, without the above trick, and coverage will report the coverage of the installed package. That is usually good enough. Super cool Calvin #3: Finding Mona Lisa in the Game of Life with JAX by Atul Vinaya Lots of great code examples Showcases the speed increase you can get using JAX on a GPU vs CPU unvectorized Initial implementation took days of CPU time to get a rough result JAX compiles numpy to highly vectorized code to run on a GPU Requires some refactor of the code to optimize for a highly parallel run on GPUs Post includes link to notebook used for the project “Running ~1000 iterations for a 483px wide Mona Lisa on the google colab GPU runtime only takes around 40 seconds!” Michael #4: Python Package Index nukes 3,653 malicious libraries uploaded soon after security shortcoming highlighted From Mark Little Recall Google’s Python goal was around PyPI security. Related (from @tonny) Poison packages – “Supply Chain Risks” user hits Python community with 4000 fake modules PyPI has removed 3,653 malicious packages uploaded days after a security weakness in the use of private and public registries was highlighted. “Developers are often advised to review any code they import from an external library though that advice isn't always followed.” ← yeah Last month, security researcher Alex Birsan demonstrated how easy it is to take advantage of these systems through a form of typosquatting that exploited the interplay between public and private package registries. Birsan set out to see whether he could identify the names of private packages used inside companies and create malicious packages using those library names to place in the public package registries – the indexes that keep track of available software modules. The names of private packages turned out to be rather easy to find, particularly in the Node.js/JavaScript ecosystem because private package.json files show up rather often in public software repositories. So Biran crafted identically named libraries that he designed to sneak system configuration data through corporate firewalls. The challenge then became getting applications that require private libraries to look for those file names in a polluted public source. As it turns out, it's common for corporate software developers to rely on a hybrid configuration for their applications, one that references private internal packages but also supports fetching dependencies from a public registry, in order to ensure packages are up-to-date. The companies that Birsan managed to attack with this technique include Apple, Microsoft, Netflix, PayPal,Shopify, Tesla, Uber, and Yelp. And for his efforts, he has been awarded at least $130,000 from bug bounty programs involving these firms. Birsan's success in carrying out such attacks should set off alarm bells. Software supply chain attacks present a higher degree of risk than many threat scenarios because they have the potential to affect so many downstream victims Makes me want to setup devpi + devpi-constrained just for internal projects. What to do? Don’t do mass bogus uploads like this to prove your point. We appreciate the message you are trying to deliver, but it’s already been documented so you are just making distracting work for other people who could more usefully be doing something else for the project. Don’t choose a PyPI package juat because the name looks right. Check that you really are downloading the right module from the right publisher. Even legitimate modules sometimes have names that clash, compete or confuse. Don’t hook internal projects to external repositories by mistake. If you are using Python packages that you haven’t published externally, then the one thing you can be sure of is that all external copies of “your” package are imposter modules, probably malware. Don’t blindly download package updates into your own development or build systems. Test and review everything you download before you approve it for use. Remember that packages typically include update-time scripts that run when you do the update, so malware infections could be delivered as part of the update process, not of the module source code that ultimately gets installed. Brian #5: python-adventure Brandon Rhodes "This is a faithful port of the “Adventure” game to Python 3 from the original 1977 FORTRAN code by Crowther and Woods (it is driven by the same advent.dat file!) that lets you explore Colossal Cave, where others have found fortunes in treasure and gold, though it is rumored that some who enter are never seen again. “ “For extra authenticity, the output of the Adventure game in this mode, python3 -m adventure, is typed to your screen at 1200 baud.” “Colossal Cave Adventure is the first known work of interactive fiction and, as the first text adventure game, is considered the precursor for the adventure game genre. “ - wikipedia related: Zork, 77-79, also an adventure game, was inspired by Collossal Cave Adventure, 75-77 Zork on Chuck - Brian’s a Chuck fan Brandon, can we have Zork also? side note: Closest I got was Dungeons of Daggorath on TRS-80. Not text based. Early 80’s Calvin #6: Exciting New Features in Django 3.2 From Haki Benita Upcoming LTS Release in the 3 series Expected in April Post highlights some interesting new features that you might not have noticed New Features Covering Indexes in Postgres support (performance plus!) Timezones are hard and TruncDate now helps keep you from pulling out the foot cannon JSONObject DB Functions, helping the unstructured data world keep using Postgres Signal.send_robust() now logs exceptions so you don’t have to! The new QuerySet.alias() method allows creating reusable aliases for expressions (Performance!) The new display decorator makes creating calculated admin fields cleaner Value Expressions Detects Type, more cleaning up to allow the ORM to figure it out Notable missing feature is Async ORM, but this will be awesome when it lands More are listed on the Django 3.2 Release Page Extras: Michael: Is Python on Mars? FastAPI Website course is out: talkpython.fm/fastapi-web Are you thinking of going to PyCon 2021? Over at Talk Python, we're giving away 5 tickets to the event: talkpython.fm/pycon2021 Be sure to join us @ pythonbytes.fm/youtube Got a chance to speak to the medical field about Python and programming superpowers on the Finding Genius podcast. Calvin: DjangoCon Europe 2021 CFP is Open until 4/1 https://2021.djangocon.eu/talks/cfp/ Python Web Conf 2021 4 Tracks this year 60 Amazing Speakers (almost 20% women) Tickets Professional $199 Student $99 Grants Available! Joke: /** Logger */ private Logger logger = Logger.getLogger(); // This is black magic // from // *Some stackoverlow link // Don’t play with magic, it can BITE. # For the sins I am about to commit, may Guido van Rossum forgive me // Remove this if you wanna be fired } catch(Exception ex) { // Houston, we have a problem } int getRandomNumber() { Return 4; // chosen by fair dice roll. // guaranteed to be random. } https://twitter.com/LinuxHandbook/status/1368974401979383810
O Simple Queue Service (SQS) é o Serviço de enfileiramento de mensagens da Amazon Web Services (AWS). Filas (Queue, em inglês) de mensagens são tarefas que aguardam pelo processamento. As aplicações consomem essas mensagens e executam as ações pertinentes, como por exemplo, ligar ou desligar um componente ou dispositivo, enviar mensagens de email, processar pagamentos, entre outros. O Amazon SQS possui dois tipos de Filas: Padrão (Standard) e FIFO (First in, First out). A principal diferença entre elas é que na FIFO a ordem em que as mensagens são enviadas e recebidas é extremamente preservada (a primeira que entra é a primeira que sai). Neste vídeo apresentaremos o recurso de SQS em maiores detalhes buscando exemplificar como os Workers consomem a fila de mensageria, como utilizar este recurso em aplicações de Microsserviços (Microservices) e Node.js. Acompanhe mais vídeos da série "Programando":
In this unique edition of the Cloudify Tech Talk Podcast, we are taking a dive into machine learning and AI when it comes to DevOps - all through the eyes of our special guest Joshua Odmark, CTO and Founder of Pandio. In this discussion Josh walks us through his experience with different distributed messaging systems such as Apache Kafka, SQS and explains why he chose Apache Pulsar from many other machine learning and messaging related systems.
En este es el episodio #5 del Podcast de AWS en Español.En este episodio, hablamos de Serverless y entramos en un montón de detalles sobre los servicios de AWS que necesitas saber para empezar con serverless.00:00 - Introdución01:04 - RDS proxy02:13 - App2container 03:44 - CodeGuru05:40 - Introducción al tema de hoy06:42 - Un poco de historia… 08:19 - La mentalidad serverless14:21 - FaaS y BaaS15:05 - Servicios gestionados - BaaS22:10 - Cómo cambiar la mentalidad30:11 - Servicios basicos serverless de AWS30:34 - AWS Lambda36:55 - Disparadores de Lambda37:54 - API Gateway41:03 - GraphQL con AppSync44:04 - Arquitecturas orientadas a eventos46:11 - SNS, SQS y EventBridge50:06 - Step functions53:56 - DynamoDB59:59 - Elegir servicios gestionados01:02:58 - Infraestructura como código01:04:33 - Sistemas de monitoreo01:07:22 - Automatización de despliegues
פרק מספר 68 של באמפרס (393 למניין רברס עם פלטפורמה) - רן, אלון ודותן נפגשים שוב ב-8 ביולי 2020 בעיצומו של הגל השני, מקליטים מהבית דרך Zoom . . . ואף על כן - באמפרס: רן, אלון ודותן עם סידרה של קצרצרים על מה שקרה ברשת, מה עניין אותנו, בלוג-פוסטים מעניינים שנתקלנו בהם, Repos מעניינים ב-GitHub ועוד.אז נצלול . . .רן - חברת Microsoft הוציאה לקוד פתוח את התוכנה שנקראת GW-BASIC - מי זוכר מה זה?מדובר בשכלול קל על ה-Basic הרגיל, הכי בסיסיה-GW-BASIC הייתה אחת הגרסאות הכי פופלאריות של Basic - יכול מאוד להיות שאם אתם מכירים Basic, אז אתם מכירים את הגרסא הזו.למעשה, Microsoft גם הוציאו בלוג-פוסט וגם Repo ב-GitHub, ששם נמצא כל ה-Source Code של GW-BASIC(דותן) שאפו על זה שהם ממש שמו היסטוריה אמיתית ב-Git - יש כאן “38 years ago” . . .(רן) כנראה באמת שיחזרו את ההיסטוריה, כי Git לא היה קיים לפני 38 שנים . . .אתם יכולים לגשת לכל קבצי ה-ASM (הלא הם ה-Assembly!) ולקרוא את הפקודות - אשכרה פקודות-מכונה שבאמצעותן נכתב GW-BASICמרתק למי שבקטע - או סתם נוסטלגיה למי שפחות.(דותן) אתם יודעים מה זה אומר? (אלון) שאפשר להתחיל לכתוב ב-BASIC?(דותן) גם - וגם שצריך להתחיל לפתוח להם Pull-Requests . . . למה אין Source folder?! למה אין Make?!(רן) לגמרי - מבחינת איכות כתיבת הקוד . . .(דותן) אין פה Folders בכלל! מחפש איפה להיכנס ואין לאן.(אלון) אני לא יודע האם לפני 38 שנים Windows ידע לעבוד עם Folders - בעצם זה היה עוד בכלל DOS . . .(דותן) כן, יש פה Code of Conduct ו-Contributing . . . תתרום! אה, בעצם - “Please do not send Pull Requests” . . .(רן) למרות שיש פה ושם עדכונים - ראיתי אחד לפני חודשיים, אז זה לא שזה לגמרי הכל כמו לפני 38 שנים, אבל הרוב כן.(דותן) וכולם כל כך ממושמעים - אין כאן אפילו Pull Request אחד שנפתח, לא Closed, לא כלום . . .(רן) כן, טוב - הם הבעלים של הפלטפורמה, בוא לא נשכח . . .סקר של Stack Overflow שהתפרסם לא מזמן - הסקר השנתי שלהם של שנת 2020הם כל שנה מוציאים סקר וזה תמיד מעניין ונחמד לקרוא את מה שהם כותבים.הפעם הדבר הבולט ביותר בעיני הוא שויזואלית - זה מהמם . . . פשוט מעוצב יפה.יש שם גם הרבה תוכן, אבל הדבר הראשון שבולט לעין (כן . . .) זה שזה מעוצב יפה, עם JavaScript כזה אינטראקטיבי וכל מיני גרפים שזזים.על הסקר ענו 65,000 מפתחים מרחבי העולם - אפשר לראות פרטים דמוגרפיים שלהם וכו’.אני לא זוכר איזשהו אייטם ספציפי לגבי שאלות או תשובות מעניינות שראיתי, אבל יש שם המון אינפורמציה - כל אחד ימצא את מה שמעניין שם.יש המון אינפורמציה על טרנדים דמוגרפיים וטרנדים בתעשייה - אם זה טכנולוגיות ודברים כאלהפשוט כיף לראות את זה, ויזואלית זה מאוד יפה, עם הרבה מאוד אינפו-גרפיקות מכל מיני סוגים.אם אתם זוכרים, באחד הפרקים שעברו דיברתי על זה שאני קורא כמה ספרים ובינתיים לא מצאתי משהו מעניין - אז מצאתי ספר טוב שאני כן רוצה להמליץ עליודותן, זוכר? אמרת שברגע שיהיה משהו להמליץ אז נמליץ? אז הנה - ספר שאני עדיין בעיצומו ולא סיימתי לקרוא אותו ונקרא An Introduction to Machine Learning, שזה תחום שאני עוסק בו בזמן האחרון.הורדתי את הספר אונליין, אני קורא אותו כ eBookמה שאני אוהב בספר הזה זה(1) הוא כתוב בשפה מאוד יפה, זאת אומרת - בניגוד לספרים אחרים שקראתי והייתה בהם אנגלית “קצת שבורה ומעצבנת”, כאן זאת באמת שפה יפה שכיף לקרוא ובנוסף (2) יש בו הרבה מאוד תרגילים - בסוף כל פרק - שמאוד עוזרים להפנים את החומר.יש שלושה סוגי תרגילים - סוג אחד הוא “תרגילי חשיבה”; סוג שני הוא “קח נייר ועפרון ותעשה חישוב” וסוג שלישי של כתיבת תוכניות שמממשות Perceptron או מסווג מסוג כזה או אחר - וזה מאוד עוזר להפנים את החומר.אז הספר נקרא An Introduction to Machine Learning, בהוצאת Springer, המחבר הוא Miroslav Kubat - אמריקאי מאוניברסיטת פלורידה (מיאמי)אם אתם בעניין של לעשות איזושהי הכרות עם Machine Learning אז זו היכרות די מעמיקה, אני חייב להגיד.(דותן) עד כמה הוא פרגמטי? או אם לשאול בצורה אחרת - אתה צריך לדעת אלגברה לינארית לפני כן? להיזכר בכל מיני דברים מהאוניברסיטה, או שהוא מאוד פרגמטי?(רן) הוא לא מאוד פרגמטי . . . הוא לא מדבר על ספריות כמו Pandas או TensorFlow, לא מדבר בכלל על כליםהוא מדבר ברמה התיאורטית - אבל התרגילים הם כן פרקטיים, זאת אומרת שצריך ממש לכתוב תוכנהאני את התרגילים האלה כותב ב Clojure מתוך היצר המזוכיסטי שלי . . .אתה כן מקבל איזשהו ניסיון תכנותי - אבל הוא לא פרגמטי כל כך במובן של “להכיר כלים אמיתיים”.מבחינת ידע ורקע - אני חושב שמתימטיקה ברמה של תואר ראשון זה לגמרי מספיק, כנראה שאפילו פחות, אולי אפילו רק השנה הראשונה של התואר הראשון מספיקה; אלגברה לינארית ברמה לא גבוהה מדי, חשבון אינפיטיסימלי או חדו”א (!) ברמה גם לא-מאוד-גבוהה - צריך להבין מה זו נגזרת, מה זה אינטגרל, דברים כאלה . . . שנה ראשונה באוניברסיטה בכל אחד מהמקצועות המדעיים נותנת לכם רקע מספיק בשביל הדברים האלה, עם קצת הבנה בהסתברות וסטטיסטיקה, אולי קצת הבנה בקומבינטוריקה אבל לא הרבה. זהו . . .זה לא ספר קל, אני חייב להגיד (כי עד עכשיו נשמע סבבה) - דורש קריאה איטית ומחשבה, אז גם אם יש לכם את הרקע, זה לא רומן . . . זה משהו שדורש מחשבה והעמקה ובעיקר תרגול.בכל אופן - אני אוהב את הספר. המלצה!(אלון) טוב לדעת . . . אבל אם לא סיימת, עדיין אפשר לעשות לך ספויילרים על מה קורה בסוף! נגלה לך איזו תוכנה אתה כותב בסוף . . .זה ספר על Machine Learning, מה כבר יכול לקרות?(רן) האם המסווג הוא חיובי או שלילי?נושא אחר אבל קצת דומה (ופרגמטי) - בלוג-פוסט של GitHub שמתאר איך הם עושים MLOps (שזה בעצם Machine Learning Ops) באמצעות GitHub Actionsה - GitHub Actions הוא Feature בן שנה בערך, אולי יותר - ומאפשר לעשות לא רק CI מעל GitHub אלא בכלל איזושהי אוטומציה יותר כלליתלמשל - בכל פעם שעושים Push, אז להריץ איזשהו Pipelineכאן הם מתארים כל מיני משימות סטדנרטיות שיש ב-Machine Learning, שהם מכנים בשם הכללי “MLOps”לא שהם המציאו את השם הזה, הוא היה כבר קייםלמשל - ניקוי Data או Feature Engineering או הרצה של כל מיני Frameworks (במקרה הזה מדברים על binder) - דברים כאלהוכל זה - ב-Pull Request, וזה נחמדהרבה פעמים כשמפתחים איזשהו מודל ורוצים לעשות אופטימיזציות, רוצים לראות שלא עשינו משהו יותר גרוע, שלא שברנו משהו - וזה נחמד שכל הדברים הללו יכולים לקרות בצורה אוטומטית.אתם חושבים ששיפרתם משהו - עשיתם Commit לאיזשהו פרמטר ואז פתאום מגלים ששברתם משהו אחר . . . זה כל ה- Concept מאחורי Contentious Integration.בהקשר הזה - MLOps זו התשובה, והם נותנים דוגמא שלה באמצעות GitHub Actions(אלון) זה נשמע ממש בסיסי . . מה הבשורה שלהם?(רן) כקונספט, לנו כמהנדסים, אין כאן שום דבר חדש - אבל הם כן מראים איך הם עושים אינטגרציה לכלים הרלוונטיים השונים.איך אתה עושה Extraction ל-Data, איך אתה עושה Feature Engineering, איך אתה מריץ את המודל - וכל זה בתוך ה-Containers שלהםלמי שעושה CI כבר שנים אין פה חדש, אני מסכים - זה לא קונספט חדש, אלא משהו יותר פרקטי, מראים את הכלים עצמם(אלון) משעשע שהם משתמשים Argo עבור Workflow, ולא במשהו פנימי . . . לא ידעתי שמישהו משתמש בזה חוץ מאיתנו . . .שפה בשם goplus - וכן, זה “Go עם עוד קצת” . . .זה מעיין Super-set של Go, כשכל תוכנית ב-Go היא גם תוכנית ב-goplus - אלא של-goplus יש גם Syntax נוסף שמאפשר לה להיראות קצת כמו Script, קצת כמו Python באיזשהו מובן.לא חייבים להכריז על פונקציה, אפשר פשוט לכתוב “=:a” ולכתוב לשם איזשהו מערך וכו’ - נותן איזשהו “Feel” של Python (או Ruby או JavaScript), אבל עם Syntax שהוא מאוד Go-י - קצת כמו לקחת את Go ולעשות ממנו Script.כמה פיצ’רים בולטים - אפשר פשוט להריץ את זה כסוג של Script, לא צריך לכתוב פונקציה כדי להריץ משהוכמו ב-Python, יש יכולת לעבוד על List Comprehensions (או Map Comprehension), שכל מי שאוהב את Python בודאי מכיר - For x in . . . where x>3 - אז אפשר לעשות את זה גם למערכים וגם ל-Maps, וזהו מאוד קומפקטי ונחמדזה לגמרי Compatible עם Goויש עוד הרבה פיצ’ריםויש גם Playground - כמו שיש את ה Go Playground, יש גם Go+ Playground, שזה נחמדכל הקונספט של זה, לפי מה שרשום, זה שזה אמור להיות ידידותי ל-Data Science: ה-Tagline הוא The Go+ language for data scienceלמה זה “ידידותי ל-Data Science”? כי Data Scientists בדרך כלל עובדים בתוך Notebooks, כותבים סקריפטים קצרים ורוצים לראות מה התוצאה - ולכתוב תוכנית ב-Go זה לפעמים overhead שהוא קצת פחות מדבר ל-Data Scientists, ובגלל זה Python כל כך קוסמת.אז goplus מביא את חלק מהיתרונות של Python לפהכמובן שהחלק המשמעותי הוא הספריות - שאולי חלק מהן קיימות, אבל זה ממש לא באותה רמה של Python, אבל השפה כבר פה.האם זה חילול הקודש או ברכה? לא יודע, כל אחד עם הטייק שלו . . . מי שאוהב את Go ואוהב אותה כמו שהיא אז עבורו זה כנראה חילול הקודש, אבל למי שרוצה לראות את Go מתפתחת לכל מיני כיוונים אז זה אולי אחד מהכיוונים.דרך אגב - אני לא רואה את המפתחים של Go מאמצים משהו מפה - זו לגמרי שפה אחרת, אפשר לחשוב על זה כמו על C ועל ++C - יש כאלה שפשוט ישארו עם C תמיד ולא ילכו ל++C, וזה לא מתערבב.בכל מקרה - זה מעניין, וזה Repo שהושקעה בו הרבה מאוד עבודה - וגם מאוד פופולארי ב-GitHub(אלון) יש פה כמה קונספטים ממש מעניינים . . . ה-Error-Handling זה משהו שמאוד התחברתי אליו, הוא הרבה יותר הגיוני לדעתי.אני חושב שלקחת את Go ולהביא אותה ל - Data Science זה מעניין, אבל לדעתי זה לא יבוא מ-Go אלא יבוא מ-Rust כי Facebook מאוד דוחפים לזה, אבל זה מעניין, קונספט מעניין ומבורך.(רן) דרך אגב - יש ספריות Data Science ב-Go, הן לא עשירות כמו אלו של Python אבל בהחלט קיימות. בואו נראה . . .גם ב-Rust זה מעניין - יכול להיות שאת ספריות ה-Core, אם היום כותבים אותן ב-++C אז מחר יכתבו אותן ב-Rust, אבל עדיין משתמשי הקצה . . . הרבה מה- Data Scientists לא כותבים ב-++C אלה ב-Python או R, ואני לא רואה אותם עוברים ל-Rust סתם ככה, אלא אם כן הם באמת צריכים לכתוב ממש ספריות, וזה לא רוב הזמן.אלון - נתחיל מאחד הנושאים הפופולאריים - הפגנות Black Life Matter: התחילו לעשות “ניקוי שורות” בכל מיני שפות, נתחיל ב-Go כדי להמשיך את הקו: Pull request של להעיף את כל הרפרנסים ל - White list מול Black list או Master ו-Slave מה-Core Library של Goשמתי את זה בתור אחד מהראשונים שלי, ואז זה התחיל לתפוס פופולאריות בעוד כל מני מקומות, ולהתחיל להעיף איזכורים מעוד כל מיני מקומות.הרעיון הוא ש -whitelist/blacklist זה דבר פוגעני, וצריך להחליף ל Allowlist /Blocklist - שזה גם שמות יותר ברורים, האמת.ואת master/slave ל- Primary / Secondary אני חושב, לא רואה את זה כרגע.בקיצור - הרבה שפות התחילו לשנות, לא רק Go, והמונחים שאנחנו רגילים להשתמש בהם הולכים להשתנות כנראה בתקופה הקרובההדבר היחיד שעוד לא ראיתי ששינו זה את ה Git Repo - ה-Root זה עדיין Master . . . אבל עוד לא ניתקלתי במחאה בכיוון הזה.(דותן) חייב להגיד שאני נפלתי פה - לקחתי את ה-Commits שיש פה, סתם כדי להסתכל, ונפלתי על To-Do - שינו את הטקסט ב To-do, והיה שם Split כדי שאפשר יהיה לעשות allowlist במקום whitelist - אז אם כבר נכנסו ושינו, לא לא כבר עשו את ה-To-do? . . .(אלון) אם אתה הולך נגיד על fmt, אז שינו שם למשל את blacklist ל-blocklist . . .(דותן) כן - אבל יש שם הערה שאומרת “to-do: צריך לממש את זה אחרת”, ואם אתה כבר עושה re-factor ל-Comment אז כבר תעשה מימוש . . .(אלון) תראה, אני לא נכנסתי פה . . .(דותן) אבל אתה כבר שם! שינית את ה- whitelist ל-allowlist . . .(אלון) בסוף זה Copy-Paste-Replace . . . כן, שינו - אתה יכול לעבור על ה-commits, חלקם זה באמת Comments (בתוך ה-GC זה Comment) . . .בתוך loader.go שינו whitelist ל-allowlist(דותן) אז צריכים לעבור קובץ-קובץ ולהכריז . . .(אלון) כן, אין הרבה שינויים - אבל עשו עבודה, וזה לא במקום היחיד שעשו את השינוי הזה.טוויט נחמד שנתקלתי בו - Ashley Willis שאלה What’s the best tech talk you’ve ever seen?מה שמעניין זה שיש פה מאות תשובות עם לינקים להרצאות, שכל אחד טוען שזו ההרצאה הכי טובה שהוא ראהעברתי על זה ברפרוף ואמרתי שאני שומר לעצמי את הלינק הזה - והעבודה הבאה היא לפלטר לי מפה הרצאות ולהכין רשימת צפייה, כי זה בטח שווה משהו, אם כל אחד שם את ההרצאה שהוא חושב שהיא הכי טובה אז בטח יש פה רשימה מכובדת, “חוכמת ההמונים” וכו’.נראה כמו לינק שעבור מי שמחפש הרצאות לראות אז זה יהיה מאוד שימושי עבורו.(דותן) יש על זה כבר Crawling או עוד לא? . . . (אלון) לא . . .הנה , יש לך הזדמנות - שמו לפעמים את אותו לינק פעמיים ואז תדע עם מה להתחיל.(רן) רציתי להגיד שזה מדהים, מבחינת חדשנות ישראלית, איך לכל דבר אנחנו מביאים את ה-Touch האישי שלנו, פשוא מדהים המוח היהודי . . .(דותן) צריך רק למצוא איזו תמונה של מישהו מרצה על איזשהו Slide, ואז כשאתה לוחץ . . .(רן) כן, בשנות התשעים זה היה אחד הטובים(אלון) היית עושה מליונים, הרבה לירות היה יוצא לך מזה . . . בקיצור, יש כאן הרצאות ענתיקות בחלקן וחלקן מהשנים האחרונות, אנשים שמו פה הרצאות גם מ-1900 ומשהו, אני לא יודע אם היה למרצה מחשב באותה תקופה, כל מיני כאלה - וחלק זה ממש מהשלוש-ארבע שנים האחרונות אז כנראה יותר רלוונטי . . . נראה לי מגניב(דותן) אני גם לא רואה כאן את Remembering Joe . . .(רן) של Joe Armstrong? אני חושב שאני מכיר . . .(דותן) זה היה באחד הפרקים (369 הקוסמי!), מה זאת אומרת?!(רן) בסדר, לא כולם מקשיבים (ברור, חלק רק קוראים)(אלון) דווקא חושב שראיתי את Joe Armstrong שם, די בטוח - בקיצור, תעבור, תכין רשימה יותר מצומצמת, ניתן לרן לצמצמם עוד קצת - ואז אני אסתכל(דותן) אי אעשה את הישנים והטובים, אתה תעשה את המודרניים והמגניבים(רן) ואני דורש שיהיו בכל רשימה לפחות חמישה מכנסי רברסים שעברו . . .(אלון) זו הזדמנות להכניס שם ל-List ולהתחיל להפציץ אותו . . . אני מבקש מכל המרצים: כל אחד, שישים את הלינק של עצמו.זו קריאה למרצים! - שימו את הלינק להרצאות שלכם שם, ואז אתה מקפיץ את הכנס כנס? 2020?ספריה ישראלית - golang mediary - של Here Mobilityהוספת interceptors ל-http.Clientשלחו לי - הסתכלתי - נחמד - מפרגן בכיףהרעיון הוא שאפשר להתחבר על ה HTTP Request - לפני ה-Request, אחרי ה-Request, ואז לעשות אינטרפולציות ל-Request עצמו או ל-Responseאפשר להוסיף לוגים או דברים של Security או statsd . . . יש דוגמאות, גם Tracing . . . יכול להיות מענייןנראה חמוד למי שצריך את זה, ספריה צעירה יחסית - שיהיה בהצלחה! אני אהבתיונמשיך עם Go, ככה יצא הפעם - mockery זו ספריה שמאפשר לעשות Mock-ים ב-Goספרייה מאוד פשוטה וחמודה - למי שמחפש לעשות Unit Test ומחפש איך למקמק (create mocks) קוד - שווה להסתכלנחמד, פשוט, קליל, שימושי ונוח.(רן) ואחת הפופולאריות שבהן - יש עוד אחת-שתיים, אבל זו אחת הפופולאריות ביותר(אלון) מה שמפתיע זה שגם הפופולאריות לא פופולאריות . . . פחות מ-2000 Stars זה . . . או שאנשים לא עושים טסטים, גם אופציה(רן) אני חושב שפשוט צריך הרבה פחות Mocks, במיוחד ב-Go, בעיקר בגלל הגישה של ה-Interfaces - פונקציה שמקבלת Interface, אז אם הוא מספיק “רזה” זה כל כך קל למקמק (Mock) בעצמך כך שאתה לא חייב שום Framework.מתי כן צריך Framework? אולי לא צריך - אבל מתי תרצה? או כשה-Interfaces יחסית ארוכים ואתה לא רוצה למקמק הכל בעצמך, או כשאתה רוצה לעשות Spying: לספור את מספר הקריאות או משהו כזה, ואז אתה כבר תלך ותשתמש באיזשהו Frameworkאני, בטסטים שלי, פשוט יוצר Instances של ה-Interfaces בלי להשתמש באף Framework - יותר קומפקטי, יותר מובן, לדעתי, לא מצריך ללמוד עוד Framework - אני חושב שזה לפחות חלק מההסבר(אלון) כן, אבל הרבה פעמים יש דברים מורכבים . . . זה נכון לדברים יותר פשוטים, אבל כשאתה בא לספריית צד-שלישי בדרך כלל, עם כל מיני התחברויות ודברים שקורים . . . זה יותר מורכבאני ניסיתי פעם למקמק ל-S3, וזה לא היה סימפטי(רן) במקרים כאלה אני באמת לא אקח את זה על עצמי ובאמת אשתמש בספרייהאו שאני אשתמש בבדיקות אינטגרציה (Integration Testing), למשל - ארים Container שיש לו Interface של S3 - מכיר את Testcontainers? יש להם מלא קונטיינרים עם כל מיני כלים - S3 זה אחד מהם אם אני לא טועה, יש ל-SQS ולעוד כל מיני דברים כמובן - כל הדברים הסטנדרטיים כמו Databases מסוגים שוניםאז אתה יכול פשוט להרים Container - ודרך אגב יש לזה גם תמיכה ב-Go: אתה יכול לעשות setup לטסט שמרים לך Container בהתחלה ואז מוריד את ה-Container, ולפעמים זה יותר נוח מאשר למקמק (Mock it) את זה בעצמךזה אמנם רץ יותר לאט, אבל מצד שני זה קצת יותר אמין, מבחינת ה-API(אלון) מבחינת טסטים ל-Integration זה הכי נחמד - אבל זה כבר Integration Test ולא Unit Test.(רן) נכון, זה כבר לא Unit Test - אבל אתה כבר עובד עם S3, האם זה עדיין Unit Test? שאלה פילוסופית . . . אם אתה גם ככה כבר עובד עם משהו כבד חיצוני, זה כנראה גם ככה כבר לא ממש Unit Test.(אלון) זה ברור, אני נכנסים פה כבר לפילוסופיה . . .(דותן) זה עניין של טעם, בסוף - טעם ואיזון.(רן) לגמרי - אני לא מנסה להחליט מה זה Integration Test ומה זה Unit Test כי לא נצא מזה בחיים - רק אומר שיש לך כאן כמה אופציות, ואחת מהן זה באמת לעשות Mocking באמצעות mockery או באמצעות כלים אחרים; אופציה שנייה זה לקחת את ה-Interfaces ולממש אותם בעצמך, וזה נוח כשה-Interfaces יחסית “רזים”; ואופציה שלישית זה באמת להרים Service, אם אתה מדבר עם Service - להרים Service ב-Container ליד; או, רחמנא ליצלן! - לדבר עם ה-Service האמיתי (למשל S3 האמיתי), אבל זה ברוב המקרים הכי פחות מומלץ.אם אתה באמת הולך על הגישה של Container - יש Framework כזה שנקרא Testcontainers, שיש לו תמיכה בהמון שפות - Java ו-Go ובטח עוד הרבה - שממש נותנים לכם בזמן ה-Setup של הטסט להרים Container ולהוריד אותו בסוף הטסט, והאינטגרציה הזו מאוד נחמדה.(אלון) זה חמוד ממש - ותמיד יש את ההמלצה הקבועה: הכי טוב זה טסט אמיתי - טסט על Production! למה לא לנצל את זה?(רן) Famous last words . . .דותן - ספריה ש-Apple הוציאה, או יותר כמו Framework, בשם ExposureNotificationאם נחבר את זה לאקטואליה - בעצם הם ייצרו Framework סטנדרטי שממדל חשיפות ל - COVID-19זה חלק מההכרזות שלהם לא מזמן (iOS 13.5 release)- הם ראו שיש כל מיני ממשלות או כל מיני אפליקציות שמנסות למדל חשיפות לקורונה על גבי מפה וכו’ - והם פיתחו עבור זה API סטנדרטיעכשיו אם אתה רוצה לבנות אפליקציה כזו - אתה יכול להשתמש בספרייה הזאת, והיא גם עוזרת לך פה ושם.אני (דותן) נכנסתי לקרוא את ה-Interface, ויש שם כמה חלקים מגניבים, שאולי מגיעים משפות של רפואהלדוגמא, לרגע התבלבלתי כשהיה כתוב שם “Transmission risk level” ו-”Signal” - אני לקחתי את זה לכיוון של רדיו וכו’ . . .(רן) אתה כנראה הסתכלת על טורי פורייה, אבל הכוונה לביולוגיה . . .(דותן) בדיוק . . . הכוונה ל-Transmission של המחלה, אולי ה-Signal של המחלה? בכל אופן - נראה מעניין, לפחות ברמה של ה-API, שאפשר לקרוא איך נראית קורונה דרך API . . . זה מגניב, וכמובן שאם מישהו רוצה לפתח אפליקציה פופולארית ל-App Store, אז זה מקל את הכאב . . .(רן) דרך אגב - לא דיברנו כאן ואולי שווה לדבר על איך עובדות אפליקציות למעקב אחרי קורונה . . . בגדול, לפי מה שאני (רן) יודע, יש שני סוגים - סוג אחד זה לפי קירבה - משתמש ב-Bluetooth ועושה איזשהו מעקב אחרי מי נמצא ליד מי, למשל אם אתם נמצאים במקום ציבורי, אז ה-Bluetooth שלכם “מדבר” עם Bluetooth של אחרים, וככה אתם יודעים אם אתם קרובים למישהו אחר - ואם אחר כך מתגלה שהוא חולה, אז יש את המעקב הזה.איך זה נשמר ואיך באמת עושים את הגלוי? זה כבר סיפור אחר . . . אבל לפחות ברמה העקרונית, ברמה הפיזית, הגילוי הוא באמצעות Bluetooth.שיטה אחרת זה באמצעות מיקום - GPS וכו’למיטב ידיעתי, השיטה של ה-Bluetooth נקראת “השיטה הסינגפורית”, ואותה בסופו של דבר גם Apple וגם Google מאמצים - כשדיברו על זה ש-”Apple ו-Google משלבים ידיים למאמץ משותף” אז מדובר על זה, למיטב ידיעתי, בשיטה שמבוססת על ה-Bluetoothאלא שזה לא יהיה באפליקציה - זה יהיה ממש מוטמע במערכת ההפעלה, וזה יהיה Battery efficient וכל זה.השיטה של האפליקציה הישראלית שנקראת “המגן”, אני מניח שהרבה מכם התקינו אותה - זו דווקא שיטה שמתבססת על מיקום - ולכל אחד מהם יש יתרונות וחסרונות:ל-Bluetooth - מצד אחד הוא באמת יותר אמין - ברזולוציה, Bluetooth אמור לקלוט למרחק של כמה מטרים בודדים, כשהדבקה מוגדרת, אני חושב, כמצב שבו אתה נמצא רבע שעה במרחק של שני מטרים או פחות מבנאדם - ומרחק של שני מטרים או פחות זה משהו שבדרך כלל Bluetooth יודע ו-GPS פחות יודע, כי GPS (אזרחי…) עובד ברזולוציה יותר גבוהה.מצד שני - ל-Bluetooth יש גם יכולת לקלוט מעשרה או עשרים מטרים, תלוי בתנאי מזג האוויר ורעשי רקע ודברים כאלה.לכל אחד מהם יכולים להיות False Positives, ואולי גם False Negatives - אני לא מכיר את המקרים אבל יכול להיות שיש כאלה.זהו - אני חושב שזה מעניין, ככה, קצת לדבר על הטכנולוגיה שמאחורי זה, אבל אני שואל את עצמי האם באמת Apple ו-Google יכולים לקחת את ה-Bluetooth ולהוריד שם את רמת ה-False Positives בצורה משמעותית, כי בשביל להיות מסוגלים לעשות את זה, צריך גישה ממש למערכות הפיסיות, כדי להבין באמת מה עוצמת הסיגנל ומהן רמות ההפרעה וכו’, כדי להבין האם באמת הבנאדם קרוב או רחוק ממני.(דותן) וזו קריאה ל Apple ו-Google - לשלוח מכתב למערכת (AWS מאזינים מזמן . . .), אבל כן - זה מגניב(אלון) קודם כל - שמעתם את זה פה לראשונה, כי אנחנו תמיד חוזים דברים, זה ידועאבל רגע - “לפני מיליון שנה”, כשעבדתי באינטל, היו חיישני Bluetooth והיינו מבינים איפה הדבר נמצא לפי המרחקים ועוצמת ה-Bluetooth - עוד אז רישתנו הכל ב-Bluetooth וידענו להגיד איפה ה-Wafers נמצאים בכל רגע נתון לפי מרחקים - אז זה משהו שכבר קיים, לפי הרבה שנים(דותן) עוצמת הסיגנל של Bluetooth, אם אני זוכר נכון, קיים ב-iOS(רן) כאן, זה קיים - השאלה היא רק מה רמת הדיוק של זה? לפעמים עוצמה היא “5” כשאתה במרחק שני מטרים ולפעמים העוצמה היא “5” כשאתה במרחק של עשרה מטרים . . . זה לא מדויק. אתה יכול אולי באופן יחסי להגיד מי קרוב ומי רחוק(אלון) תראה (תשמע) - אני יכול להגיד לך שאנחנו אולי היינו (Literally) בתנאי מעבדה, אבל בתנאי מעבדה זה היה מאוד יציב . . . היה מאוד ברור וזה עבד מאוד טוב, הזיהוי מרחק של מקומות, זה היה עוד בזמן “Bluetooth 0” או לא יודע איזו טכנולוגיה זה היה, אבל ה-Bluetooth התקדם מאז די הרבה אז יכול להיות שעכשיו זה שונה - אבל בזמנו זה עבד, אז אני לא יודע מה הבעיה . . .(רן) הפיסיקה השתנתה . . . באמת, אין לי ידע עמוק בזה אז אם מישהו מהמאזינים מכיר אז מוזמנים לתקן אותי, למיטב הבנתי זה פשוט מאוד תלוי בתנאי הסביבה, ובאמת יש הבדל מאוד משמעותי אם אתה בתנאי מעבדה או לא - תלוי בלחות, תלוי במכשירים האחרים שנמצאים ליד, ואני מניח שבעוד כמה פרמטרים.אבל שוב - אני בטח לא מומחה לתחום, ואני גם שמעתי או קראתי את זה איפשהו.בכל אופן - אני חושב שזה מעניין עכשיו להגיד שבאמת יש שני מודלים, ויכול להיות שהתשובה היא איזשהו שילוב של שניהם, כדי להגיע לרמה דיוק יותר גבוהה - אבל שני המודלים האלה בגדול הם שאחד מתבסס על שירותי מיקום (כמו באפליקציית המגן הישראלית), והשני מתבסס על Bluetooth, זהו, Se Tu.(אלון) רק אסיים - הפיסיקה אכן השתנתה! בתקופתי העולם היה עגול ועכשיו אומרים שהוא שטוח, אז זה כנראה שינה את כל הפיסיקה(דותן) ואז ניהיה דור 5 . . . ספריה וכלי - streamlitמבוסס Python, או לפחות לקהילת ה-Python או ככה זה נראהלמי שמכיר את Swift Playgrounds - זוכרים שהייתה ההכרזה של Apple על Swift, ואז זה גם הופיע ב-iPad - שאתה צריך לכתוב קוד ומופיעה לך ויזואליזציה של הקוד שלך והכל אינטראקטיבי, אתה יכול להזיז Sliders כאלה, והקוד שלך בעצם משתנה לפי ה-Sliders?אז הם לקחו את הקונספט הזה - ועשו את אותו הדבר ל-Pythonלפחות מה-ReadMe נראה שקהל היעד זה בעיקר Data Scientists ואנשים שמתעסקים עם Data.שיחקתי עם זה קצת וזה אחלה לכל דבר - מספיק שיש לך פה Sliders ו-Controllers אינטראקטיביים, ויש לך איזושהי פונקציה ב-Python שאתה רוצה לשחק איתה, אז זה מהר מאוד יכול להפוך לכלי לימודי, בלי קשר ל-Data Science, אחלה דבר.(רן) אני מחכה לראות את זה נכנס לתוך Jupyter Notebooks, כי זה מתבקש הרבה פעמים רציתי לעשות איזושהי ויזואליזציה (Visualization) עם איזשהו Control של Slider, או משהו כזה - ועד עכשיו לא מצאתי, אז נראה שזו אולי התשובה, רק צריך לעשות לזה אינטגרציה לתוך Jupyter(דותן) לא ראיתי על משהו כזה . . . כן נראה שיש פה חברה מאחורי זה, סוג של . . . אני מניח שהם רצו להחליף או להיות אלטרנטיבה לזה, כי זה נראה קצת כמו Jupyter.קצת בקטע של נוסטלגיה - Cryengine, או Crytek - החברה שמאחורי Cryengine שמאחורי המשחק Crisis - פתחה (Open sourced) את הקוד של המנוע הראשון של Crisis (המשחק)אנחנו לא משחקים עם ה-Crisis הראשון, אבל אני זוכר אותו, כי זה מסוג המשחקים ששינו את העולם ונשארים לך במוח, כמו Doom וכאלה (עד כדי כך?)אז הם פתחו את הקוד ואני קצת רפרפתי - קצת ++C, בגדול, שנראה שנכתב ע”י מפתח אחד או שניים, “במשיכה אחת” מה שנקרא.מעניין למי שאוהב נוסטלגיה - אני אוהב להסתכל לפעמים; לא בניתי, לא קימפלתי וממש גם לא הולך לעשות את זה, אבל לפעמים גם כיף להסתכל על קוד שנכתב באותה תקופה, וזה נחמד.(רן) אני מסתכל על Commits שלהם, ונראה שיש להם מוסכמה מעניינת ל-Commits - נגיד B! או T! או I! . . . מעניין מה זה.(דותן) האמת שראיתי את זה וזה היה נראה לי כמו רעש, אבל אתה נותן פה טוויסט מעניין . . . (רן) כנראה שיש כאן איזושהי קונבנציה (Convention) ל-Commits שאני מנסה לפענח . . לפני איזה שניים נגיד יש XB! (היה בהקלטה לפחות . . .)(אלון) וגם XI! . . . זה מגניב, עכשיו אני חייב להבין מה זה . . . T! זה סתם טקסט, אתה רואה שזה סתם Copyright וכאלה, אז זה כבר מעניין.(רן) אולי B! זה Bug . . . מה זה I! ? . . .(אלון) U! זה בטח User Interface . . . לא, בעצם זה Undo . . . נחמד(דותן) יש כאן עוד כמה דברים מעניינים - יש Commit שמתקן משהו שנראה כמו Bug מלפני חודש - עכשיו, זה Cryengine, זה מ-2004 . . . מה קורה פה?(רן) כנראה עבדו על זה כדי להוציא את זה ל-Open Source(דותן) יכול להיות . . . מעניין; אלו החלקים שאני אוהב לנבור בהם, בקוד מאוד ישן - מגלים כל מיני דברים שהאנושות כבר לא עושה.(אלון) עכשיו רק תחפש פה פרצות אבטחה ונחש מה עבר הלאה לגרסאות החדשות . . .(דותן) כן, הא . . .האייטם הבא הוא backstage - פרוייקט של Spotify שהם החליטו לעשות לו Open-sourceזה בעצם Developer portal Framework, והם מכנים את זה “open platform for building developer portals”אני חייב להגיד שקראתי את זה ומאוד רציתי לדעת מה זה - וכשראיתי אז מאוד לא רציתי לראות מה זה . . .לא יודע, אני עדיין מבשל את זה עם עצמי - זה נראה כמו Wiki משולב ב-Dashboards, והכל מוכוון למפתחים ב-Spotify - אם אתה חבר ב-Squad אז יש לך את ה-Squad metrics מול הפנים; אם אתה רוצה לקרוא חדשות אז יש לך חדשות של Spotify שם; אם אתה לראות Metrics של Services אז זה גם שם - בעצם, כל העולם שלך נמצא בתוך מקום אחד.אולי אני קצת Old school, אבל זה . . . אני קצת פחות התחברתי, זה משדר “רובוט שעובד בשביל חברה”, וכל עולמו נסגר במקום אחד . . . כשאני קראתי את זה, חשבתי שאני הולך לראות Developers Portal במובן של כל הידע של ה-Developers והפרוייקטים והכלים שאני יכול להשתמש בהם כדי להאיץ את העבודה וכו’ - אבל אני בעצם רואה פה סוג של “מנגנון שליטה” או “חוטים סביב הבובה”. אבל תשוטטו בזה, זה מגניב.(אלון) אני עוד לא הבנתי מה אני יכול לעשות עם זה, אם זה טוב או רע - אני צריך לראות את הוידאו, לא נעים לי(דותן) יש לך Gif, לא צריך וידאו . . .(אלון) ה-Gif לא מספר את כל הסיפור . . . ב-Gif זה דווקא נראה חמוד: אתה מכין דשבורדים (Dashboards), יש את כל המטריקות (Metrics) שאתה צריך, אם מעניין אז יש משהו לראות . . . יכול להיות נחמד.(דותן) זה קצת Fallacy, כי קודם כל - אם אתה מאמן או מאלף אנשים להסתכל רק במקום אחד ולא לצאת מהמקום הזה, אז אוקיי, סבבה - יש כאן כל מיני Widgets שאם מישהו שם Widget שאתה אמור להכיר אז עכשיו לא הכרת ולא ידעת אז זה לא קיים.(אלון) אתה יכול לבדוק את ה-CI, לבדוק את המטריקות (Metrics), לבדוק לוגים . . . יש לך מקום אחד במקום להתחיל לטייל, וזה לא רע.(רן) לא - וגם חברות עושות את זה אז בוא - כל חברה בונה את זה לעצמה, כל חברה שאני הייתי בה בנתה אחד כזה, אז זה יכול להיות נחמד להתחיל ממשהו מוכן.אתה יכול לבוא ולהגיד שיש לזה חסרונות, כי ברגע שאתה בונה פורטל כזה לא מסתכלים ימינה ושמאלה - אולי, אבל מצד שני כולם בונים, כי אני חושב שה-Benefit עולה על החסרון הזה.עכשיו - האם זה פורטל טוב? אני לא יודע, אבל האם צריך פורטל? אני חושב שכן, אני די משוכנע שצריך.(דותן) זה תמיד יש - יש לך Jira ויש לך את העולמות שלך . . מה שאני מכיר זה שבונים, אבל בונים בתצורה של כלי, ופה ה-Feel שאני מקבל זה של “זה העולם שלך, וה-Browser שלך נעול לתוך הדבר הזה וזהו”. זה Feel כזה, זה לא באמת . . .(רן) יכול להיות . . . אני מסכים עם זה שנכון שיהיה לו API, שזה לא יהיה UI-First אלה API-First, שכל פעולה שאתה יכול לעשות דרך ה-UI אתה תוכל לעשות גם דרך ה-CLI באמצעות Client וכו’.עדיין, אני חושב שזה נכון שיהיה איזשהו פורטל מפתחים, ששם יהיה את כל מה שהם צריכים - אתה יודע, דברים בסיסיים כמו Service Catalog ו-Metrics ואיך ליצור Service חדש, ומי ה-Owner של כל אחד מה-Services ומה התלויות בינהם ודברים כאלה.דרך אגב - לא הכל כל כך בסיס, חלק מהדברים כן מורכבים, אבל זה הכל שימושי בעיני.כל חברה שהייתי בה בסופו של דבר בנתה לעצמה אחד כזה, אז אני חושב שזה נחמד להתחיל מאיזשהו משהו, אבל אני לא יודע - צריך לעשות לו איזשהו Test Run ולראות האם זה באמת הכלי הנכון בשבילכם.(דותן) לא, עכשיו זה נראה . . פחות, אבל תנסו(אלון) אל תקשיבו! Spotify, אתה לא יכול ללכלך עליהם - הגיע סוף סוף לארץ ה - Spotify Family (קישור לא ממומן . . .), אז אני מבקש - לא ללכלך עליהם!(דותן) לא מלכלך . . . זה אחלה, כלי מדהים!הספרייה הבאה - rich - עושה צבעים ב-Pythonחייב לומר שזו סופסוף ספרייה שנראית טוב, עבור מי שרוצה ליצור Developer experience טיפה מעבר למה שיש בסטנדרט של Python.היא עושה את כל בצבעים, כל הפלטה (palette) - טבלאות ו-Spinners ו-Progress bars, עושה גם Syntax coloring על הטרמינל ועוד ועוד - אפילו מרנדרת markdown מגניב, ברגע שאתה לוקח ספרייה כזו, יש לך את החופש לעשות מה שבא לך, או שבתוך הטרמינל את יכול לרנדר Markdown, יכול להוציא טבלאותאני מניח שכלים מגניבים יבנו מעל הספרייה הזאת ובזכותהממש אהבתי - וגם עושה חשק לבנות כלי Command Line חדשים שנראים טוב ב- Pythonתשתמשו!ספרייה בשם texthero - שעושה עיבוד טקסטהדגש פה הוא על זה שהיא קלה וקלילה - אהבתי את הנקיונות של טקסט שבה, אבל יש בה עוד יכולותאתה מתקין ומיד יש לך כל מיני אלגוריתמים פופולאריים לעבודה על טקסטלא יותר מדי עמוק אבל גם לא יותר מדי - פשוט וממש נחמדלמי שלא אוהב את הדוקומנטציה (Documentation) של Docker, יש docker-cheat-sheet (באתר של Docker)כאן יש את כל הדוקומנטציה שבאתר - משוטח לקובץ Markdown אחד, הכל ב-Repositoryגם נחמד - וגם יותר קל לחפש, וגם יותר נוח להשאיר פתוח כל הזמן . . .(אלון) רשום פה “4 months ago” . . .(דותן) כן, הדוקומנטציה הרשמית כנראה מתעדכנת יותר תדיר, אבל יש פה את הדברים שהם Basic ורוב מה שלפעמים אתה אולי שוכח אז יש לך.עוד ספרייה בשם mimalloc - נושא שהוא קצת יותר Low-level ו-hardcore, דיברנו על זה קצת בעבר - הספרייה היא לשימוש ב-Allocator ש-Microsoft הוציאוהם בעצם הפכו ל-Allocator עם ה-Performance הכי טוב בשוק, פחות או יותרלאן זה רלוונטי? רלוונטי לספריות או כלים שבנויים על ++C, וב-Space האישי שלי - על Rust.אנחנו רואים פה כבר הבדלים שהם יחסית משמעותיים - היא עושה ניהול אלוקציה של זכרון (Memory allocation) פי 5 או פי 6 מהר ממה שיש שיש לך ב-Default.יש פה גם פי 10 ופי 20 לעומת אלטרנטיבות אחרותלמי שעוסק ב-Performance או ש-Performance חשוב לו, ויש לו Code Base שעושה המון אלוקציות והמון עבודה “קשוחה” כזו ב-Rust, יכול להחליף את ה-Allocator שלו ברמה של כמה דקות עבודה ולראות האם זה שיפר לו ביצועים.בשפות אחרות אני מניח שזה גםבשורה התחתונה - הופך להיות משהו שהוא פחות אקספירמנטלי וכבר נראה די טוב לשימוש.עוד אייטם שמצא כן בעיני דווקא בגלל ה-Feel שלו - hackingtool: כלי ל-Hackers כמו בשנות ה-90!מישהו לקח סקריפט ב-Python, ובנה כאלה Prompts ולוגו כזה ענק וכו’ - וזה בסך הכל מפעיל מלא Scripts אחרים, סתם הצחיק אותי(אלון ) רגע . . . עכשיו אנחנו עובדים מהבית, אבל במשרד, עם חלון כזה פתוח באופן קבוע זה . . . שמע - להיט!(דותן) כן, ממש 90’s, ממש הזכיר לי את זה - זה כזה עם תפריטים, שאתה לוחץ ואז מופיע התפריט הבא, ויש כותרת אחרת ועוד תפריט, עד שבסוף אתה מגיע למה שאתה רוצה להפעיל ואומר לו “תפעיל!” . . . ממש s’90 ונוסטלגיהבסוף יש מלא כלי Hacking, ממש המון, אז הוא לקח רק כמה - לא יודע אם זה הכלי הכי טוב ל-Hacking או ל-Pen-Testing, אבל בהחלט הכי מעלה זכרונות(רן) אני זוכר שפעם היו ממש גרסאות Linux שממש היו מיועדות לזה, עם כל הכלים מותקנים . . .(דותן) אה - יש! עדיין יש(רן) עוד עושים כאלה?(דותן) בטח . . . מה שקרה איתן זה שלמשל KALI ו-Backtrack הפכו להיות חברות, באיזושהי דרך, חברות Security שאיכשהו מימנו או קנו, ונוצרה להן מעיין יישות שהיא, מעבר להפצת Linux עם מלא כלי Security, בעצם גם מובילת-דעה בעולם של Pen-Testing, וחלק ממה שהיא עושה זה גם להוציא את ההפצה שנקראת, נגיד, KALI.אז לא רק שהיו - הן גם התרבו ויש כבר די הרבה.ב”ימים של האינטרנט הגרוע” היה לי כזה, בסטנדרט, בתיק - וכשהייתי צריך אינטרנט אז הייתי “משיג” בצורה כזאתגם ה-WiFi של פעם לא היה כזה מתוחכם - לוקח כמה דקות ויש לך סיסמא של מישהו, של ה-WiFi שלו . . .היום זה כבר פחות רלוונטי, זה יותר קשה לעשות(אלון) תגיד - הרצת את זה? יש גם מוסיקה, כמו פעם?(דותן) לא . . . אין מוסיקה, אבל זה אחלה רעיון ל-Pull Request.(רן) זה כולל קפוצ’ון?(אלון) נראה לי שתורנו לקבל קפוצ’ון . . .(דותן) רעיונות מדהים, נראה לי שצריך להוסיף ל Pull Requests - “להוסיף מוסיקה!”ועוד אחד - EasyOCR: מישהו לקח Neural Network, את כל מה שאנחנו מכירים ב-Neural Network ו-Deep Learning וזיהוי טקסט, ארז את זה בספרייה ויצר OCR שמזהה כמה וכמה שפות.אני חושב שהדגש הוא על קלות ההפעלה, או איך שלא נקרא לזהבעצם, בשלוש שורות - יש לך OCR, מה שבדר”כ היינו עושים tesseract כזה, שזה חינמי? אז פה כבר אפשר לקחת, לנסות ולראות אם זה נותן יתרון משמעותי מעל ה-OCR-ים האחרים, החינמיים.(רן) רק נזכיר למי ששכח - OCR זה Optical Character Recognition - היכולת “לקרוא” טקסט(דתן) מקבלים תמונה - מקבלים טקסטואם כבר אנחנו מתמקדים בנושא - ה OCR-ים “מהדור הראשון” לקחו פונטים ואיכשהו היו Coupled לפונטים בדרך שלהם לזהות טקסטהיום זה כבר Neural Network, אז ההבדל הוא די רציניבכל אופן - ה-EasyOCR יודע לעשות את זה גם באנגלית וגם בשפות קצת יותר אקזוטיות: סינית, תאילנדית וכו’. מעניין.אייטם נוסף - gitqlite: אני ראיתי בזה עוד פעם את “איך לא עשו את זה כבר?” - מישהו לקח Git Repo ולקח SQLite . . . היה לנו אייטם כזה פעם, שמישהו לוקח Data, מכניס אותו ל - SQLite ויוצר לו ספריית תחקור . . .אני חושב שזה היה אפילו מישהו ישראלי, זה היה נקרא q, לא? אם אני זוכר נכון . . .(רן) הראל בן עטיה כתב את q, שבאמת לוקח Data, שם אותו בתוך SQLite ואז מתשאל אותו.(דותן) כן, אז שם זה היה JSON אם אני זוכר נכון, וכאן זה Git Commits או Git בכלל - אני מניח שככה הוא בנה את זה: לקח Git Log ועשה לו קצת Parsing או אולי משהו קצת יותר מתוחכם, דחף את זה ל-SQLite לכמה טבלאות, ועכשיו יש לך כלי Command Line שאתה יכול להפעיל שאילתות מעל ה-Repo שלך או מעל ה-Git - שזה די מגניברעיון כזה פשוט ש”איך אף אחד לא חשב על זה קודם?”(רן) במקרה של q, אני חושב שהיו לו כמה סוגים של Inputs - גם JSON וגם CSV וגם Output של פקודות, שהוא היה יכול לפרסר (Parsing) אותן כטבלאות.(דותן) מגניב . . . צריך לבדוק מה הוא עשה ב-gitqlite, אבל אולי אפשר להזרים את זה לתוך q . . . בעצם לא, זה SQLite . . .ואייטם אחרון (כמעט) - practical-python: לא יודע אם זה כזה Highlight כי יש כל כך הרבה resources ללמוד Python, אבל כשהסתכלתי על זה אז משהו קפץ לי פה - השם של מי שעשה את זה הוא David Beazley - וכל מי שעשה Python בשנות ה-2000 מכיר את David Beazley, רן מכיר בטוח . . .(רן) לא מכיר . . .(דותן) הוא עשה את ה-Python Cookbook והיה די חלוץ בעולם ההוראה ה Python-ימה שהוא בעצם עושה זה לפתוח את הקורס שלו, שהוא כתב שהוא העביר יותר מ-400 פעמים, סוג של Training שלו - הוא פותח אותו ועושה אותו חינם ופתוח ב-GitHub, ואפשר ללכת ולעשות את הקורס.יש שם Exercises והוא טוען, ואני מניח שהוא צודק - שהקורס הזה הוא בעצם למידה שלו, שהוא שייף לאורך משהו כמו 20 שנה אחורה.מעניין לפחות להסתכל מה יש שם.ואייטם ממש אחרון - !HEYגובל בדרמה, ואני מניח ששמעתם מה היה עם !HEY . . .(רן) לא - ספר לנו!(דותן) אה, אוקיי . . אז יש את ה-Email החדש שנקרא !HEY, אם אפשר לקרוא לזה ככה, ש DHH . . .(רן) זה Email client?(דותן) לא יודע אם Email client, זה ממש email . . .מחליף את Gmail באיזשהו מובן, ש-DHH ו-Basecamp וכל הקבוצה הזו הוציאו.זה לא של Basecamp, אבל זה חלק מהכלים של Basecamp, נראה לי, בקטע של Productivityמה שהוא אומר זה שהוא הוציא מייל שהוא לא של אף יישות גדולה, לא יודע אם להוסיף “מרושעת” אבל כנראה שזו הכוונה שלו, שהוא תומך ב-Privacy וכו’אבל העניין שהתפתח הוא ש-DHH כהרגלו, יש לו איזשהי מנטרה ל-business שהיא מאוד ידועה, וכשהוא הגיש את האפליקציה של !HEY ל-Apple App Store, אז הוא עבר על ה-Policy של in-app purchase - וקיבלת אפליקציה שאי אפשר להשתמש בה, אלא אם כן את הולך לאתר הנפרד, שלא קשור ל-app Store, של !HEY, ואז אתה משלם ואתה כן יכול להשתמש בה . . . ו-Apple - כמובן שזה נוגד את ה-Term & conditions שלהם, אתה לא יכול לתת אפליקציה שאתה לא יכול להפעיל אותה בלי לשלם, ולשלם בתוך ה-Ecosystem של Apple - אז הם עשו לו Ban לאפליקציה . . .ואז התחילו משהו כמו שבועיים של טרור-טוויטר של של DHH נגד Apple, והתפתחו כל כך הרבה Threads ושיחות מטורפות מעל Twitter וזה די “שבר את Twitter” - ובסוף Apple וויתרו.וזהו - זה היה HEY . . .(רן) רגע - אז הם נותנים לו לעשות Purchase מחוץ ל App Store? בתוך האפליקציה?(דותן) הם סוג-של-וויתרו, וגם הוא סוג-של-וויתר - אבל זה היה . . . אם היית קורא את ה-Twitter בימים האלה אז כאילו נראה היה שיש פה מלחמה ואף אחד לא הולך לרדת מהעץ - אז בסוף הוא עשה גרסא סוג-של-חינמית והם סוג-של-וויתרו על החוקים הנוקשים שלהםאפילו מישהו פתח אתר כזה … היה איזה VP ב-Apple שאמר “You download the app and it doesn’t work”’ ואז מישהו פתח אתר כזה בשם YouDdownloadTheAppAndItDoesntWork.com - ושם היו Screenshot של כל האפלקיציות שאתה מוריד והן לא עובדות.הבדיחה היא שהן לא באמת לא עובדות . . .בין השאר היו גם Spotify ו-Netflix וכו’, וכולן במודל הזה - ב-Apple אמרו שזה Reader וזו לא בדיוק אפליקציה, אבל גם Gmail זה Reader . . . בקיצור, התפתחו שם כל מיני דיונים פילוסופיים מסובכיםיש כאלה שטוענים שזה היה PR Stunt של DHH, כי זה נתן המון פרסום - מעבר ל-Twitter זה עשה המון גלים בכל “אתרי החדשות לגיקים”, אבל זה…מה שנותר לעשות זה לנסות להשתמש בHEY ולנסות להחליף את המיילים שאתם מכירים בחינם - בכסף.(רן) יפה אז סיפקת לנו את הדרמה של היום, בהחלט.(אלון) אני עדיין לא מבין למה אני צריך להחליף את האימייל שלי מכל הסיפור . . . (דותן) אז אמרתי - אתה מוזמן להחליף את האימייל שלך באימייל אחר - בתשלום!(אלון) במקום בחינם?(דותן) כן(רן) אני חושב שזה הקטע שהוא לא מבין, דותן, אבל נסביר לו אח”כ.לאוסף(דותן) סתם - מה שהוא מוכר בסוף זה Privacy - במחיר של $99 לשנה, אתה מקבל Privacy: הוא חוסם לך Trackers וכאלה, ואתה מקבל כתובת אימייל של hey.com, שזה כאילו מגניב . . . אפשר לפתוח לרגע פסקת ציניות? כןלפני ההשקה, כי DHH חימם את כל Twitter, מישהי עשה לו Reply ואמר לו “כבר השגתי כניסה ל-HEY, והכתובת של זה Hey@username” - הפכה את ה-Domain ואת השם, שזה כאילו . . . בסוף את משלם על Domain של שלוש אותיות, זה מה שקורה.(אלון) כן - ואז תתחיל להקריא את זה בשירות שאתה צריך בטלפון: “לאן לשלוח?” - “לAlon@Hey.com” - “מה?! H?” - אנשים לא מבינים, עזוב אותך, למי אכפת שלוש אותיות?(דותן) היה שם קטע נחמד - קיבלתי Invite יחסית מוקדם, אז הדבר הראשון שאתה עושה כשאתה מקבל Invite יותר מוקדם מכולם זה לנסות לתפוס שמות . . .אז יש שם קטע נחמד של מעט אותיות - נגיד, שתי אותיות זה סכום מטורף, אבל שלוש אותיות זה כבר $350 לשנה, נדמה לי - ואז אתה כבר מתחיל לתהות . . .כמובן שניסיתי “DHH” - לא היה . . . ואז ניסיתי DNH, שזה קצת דומה ל-DHH - וכן היה.אז סתם - לידיעתם ה-fisher-ים שם בחוץ, אפשר לעשות דברים מעניינים . . . אבל לא - לא שילמתי(רן) לא שילמת $350?(דותן) לא - לא הלכתי על זה(אלון) היה פעם למישהו סקריפט שתופס שמות קצרים ב-Twitter, אבל בוא נעצור פה.(רן) אני כבר רואה את הבלוג-פוסט הבא: “אתה קונה שם בשלוש אותיות ב-$350 - וזה לא עובד!”(דותן) “com.”(רן) טוב, קצת חרגנו - הגיע הזמן לקטע של המצחיקולים, כדי להקל על האווירה אחרי הדרמה הרצינית הזאת שקרתה פה . . .הראשון - טוויט של bradfitz, אחד המפתחים המפורסמים בעולם - היה בצוות ה-Core של Go, כתב את Memcached בעבר,ועודהוא כתב ב-Twitter שהוא אחרי יום ארוך של ראיונות ורוצה להוציא את התסכול שלו - אז הנה השאלה: “Print the largest even integer in an array of integers.” - וספקו לי אך ורק תשובות שגויותוזה ניהיה מצחיק . . . אנשים הציעו כל מיני רעיונות לאיך להדפיס את המספר הזוגי הגדול ביותר במערך של Integersלמשל - תשובה אחת זה “(print(a” - פשוט להדפיס את כל מערך, והמספר הזוגי הגדול ביותר כנראה יודפס שם . . . זה עובד.תשובה נוספת - לעשות לולאה בין 0 ל-MaxInt ולהדפיס את כל המספרים - וגם במקרה הזה המספר הזוגי הגדול ביותר במערך כנראה יודפס איפשהו שם.בקיצור - היו כל מיני תשובות מתחכמות כמו “קודם כל צריך ליצור מודל ואז לאמן אותו” והייתה תשובה ב-Shell עם Grep ו-Sort . . . בקיצור, כל מיני תשובות מאוד משעשעות, מוזמנים לעבור על ה-Thread ב-Twitterוכן - חלק גם נתנו רפרנסים לתשובות ב-Stack Overflow. . . עשו מזה מטעמים. נחמד, משעשע.אייטם הבא - ypp, או: Yid++ כמו שהם כותבים - the oylem’s first programming shprachמי שיודע פה יידיש - מוזמנים לתרגם . . .וכן - Yid++ זה בעצם Compiler מיידיש ל-++C, אם אני לא טועהזה למעשה ה-Compiler הראשון בעולם, או משהו כזהאתם מוזמניםללכת לקרוא Source Code של Yid++, למשל - למשל - be_soymech_on זה (Include (iostreamו- holding shitta std זה (namespace(std - למי שזוכר את ה-++C בטח יראה את הדמיוןיש גם -bli_ayin_hara main () bh שזה בעצם (void (main, והוא מחזיר בעצם “bh”, שאני מניח שזה “בעזרת השם”ולמעלה כמובן כתוב בגדול “BSD” - שזה “בסיעתה דשמייא” כמובן . . .(דותן) זה גם מבלבל מבחינה לגאלית . . .(רן) אני בטוח שזה לא יד המקרה . . .(אלון) מעניין האם זה מתקמפל בשבת . . .(דותן) לקחת לי! אני כבר מחכה להגיד את זה!(אלון) סליחה, אתה יכול למחוק את המשפט האחרון שלי? (לא) - דותן, מה רצית להגיד?(דותן) האם זה מתקמפל בשבת? האם ה-Compiler יעבוד בשבת?(רן) בואו נקרא עוד קצת פנינים מהשפה - למשל - >>be_machriz זה >>cout, להדפסהיש פה עוד איזו מילה ביידיש שאני לא מזהה . . .בקיצור - משעשע(דותן) בינתיים אני גם מסתכל בקוד - וצריך לפרגן פה לבנאדם שכתב את זה: בחור בשם משה שור מחיפה, מהטכניון - קל”ב אליך . . .יש פה גם דברים מגניבים בקוד, כמו קובץ ++C שנקרא ani_maymin.cpp . . . בקיצור, גם הקוד עמוס בדברים כאלהזה כל כך חזק, שאני מאמין כבר שזה אמיתי . . . אני רואה שיש כאן הכשר מאיזשהו רב ל - Code base . . . זה מתחיל להיות כבר . . . צריך לבדוק את זה איתו.(רן) יש תעודת הכשר לקוד, יפה, הלך עם זה עד הסוף - כל הכבוד, משה!(רן) אני רואה שיש פה שניים - משה וגם עוד מישהו שתרם - יחיאל קלמנסון, שהוא דווקא מניו-יורק(דותן) אני חושב שזה אמיתי, זה נראה לי אמיתי, זו באמת שפה כשרה . . .(רן) לגמרי - סחטיין על העבודה חברים, אם אתם שומעים אותנווזהו - אחלה צחוק, תקראו קצת את הקוד, אני בטוח שתזהו הרבה יידיש גם אם אתם לא דוברי יידיש שוטף, בטוח שתזהו הרבה.זהו - זה הכל, כאן אנחנו מסיימים.תודה לכם אלון ודותן, היה משעשע ומחכים כרגיל - נתראה בפעם הבאה.הקובץ נמצא כאן, האזנה נעימה ותודה רבה לעופר פורר על התמלול
In this episode, Michael compares the available messaging options on AWS. The goal of messaging is to decouple the producers of messages from consumers. The messaging pattern allows us to process the messages asynchronously. This has several advantages. You can roll out a new version of consumers of messages while the producers can continue to send new messages at full speed. You can also scale the consumers independently from the producers. You get some kind of buffer in your system that can absorb spikes without overloading it.
newline Podcast Sudo StephNate: [00:00:00] Steph, just tell us a little bit about your work and kind of your background with, like AWS and like what you're doing now.Steph: [00:00:06] Yes, so I work as a engineer for a manage services provider called Second Watch. We basically partner with other big companies that use AWS or some other clouds sometimes Azure for managing their cloud infrastructure, which basically just means that.We help big companies who may not, their focus may not be technology, it may not be cloud stuff in general, and we're able to just basically optimize the cost of everything, make sure that things are running reliably and smoothly, and we're able to work with AWS directly to kind of keep people ahead of the curve when.New stuff is coming out and just it changes so much, you know, it's important to be able to adapt. So like personally, my role is I develop automation for our internal operations teams. So we have a bunch of, you know, just really smart people who are always working on customer specific AWS issues. And we see some of the same issues.Pop up over and over. Of course, you know, security , auditing, cost optimization. And so my team makes optimizations that we can distribute to all of these clients who have to maintain their own. You know, they have their own AWS account. It's theirs. And we make it so that we're actually able to distribute these automations same way in all of our customers' accounts.So the idea is that, and it's really wouldn't be doable without serverless because the idea is that everyone has to own their own infrastructure, right? Your AWS account is yours does or your resources, you don't, for security reasons, want to put all of your stuff on somebody else's account. But actually managing them all the same way can be a really difficult, even with scripts, because permissions different places have to be granted through the AWS permissions up with access, I identity and access management, right? So serverless gave us the real tool that we needed to be able to at scale, say, Hey, we came up with a little script that will run on an hourly basis to check to see how much usage these servers are getting, and if they're not production servers, you know, spin them down if they're not in use to save money.Little things like that when it comes to operations and AWS Lambda is just so good for it because it's all about, you know, like I said, doing things reliably. Doing things in a ways that can be audited and logged and doing things for like a decent price. So like background wise, I used to work at AWS in AWS support actually, and I kind of supported some of their dev ops products like OpsWorks, which is based on chef for configuration management, elastic Beanstalk and AWS CloudFormation, specifically. After working there for a bit, I really got to see, you know, how it's made and what the underlying system is like. And it was just crazy just to see how much work goes into all this, just so you can have a supposedly, easier to use for an end. But serverless just kinda changed all that for the better.Luckily.Amelia: [00:02:57] So it sounds like AWS has a ton of different services. What are the main ones and how many are there?Steph: [00:03:04] So I don't think I can even count anymore because they just, they do release new ones all the time. So hundreds at this point, but really main ones, and maybe not hundreds, maybe a little over a hundred would be a better estimate.I mean, EC2 which is elastic compute is. The bread and butter. Historically, AWS is just, they're virtualized servers basically. So EC2, the thing that made AWS really special from the beginning and that made cloud start to take over the world was the concept of auto scaling groups, which are basically definitions you attached to EC2 and it basically allows you to say, Hey, if I start getting a lot of traffic on.This one type of server, right? You know, create a second server that looks exactly the same and load balance the traffic through it. So when they say scaling, that's basically what, how you scale, easy to use auto scaling groups and elastic load balancers and kind of distribute the traffic out. The other big thing besides the scalability of with auto scaling groups is.Redundancy. So there's this idea of regions within AWS, and within each region there's availability zones. So regions are the general, like you can think of it as the place where data center is kind of like located within like a small degree. So it's usually like. Virginia is one, right? That's us East one.It's the oldest one. Another one is in California, but they're all over the world now. So the idea is you pick a region to minimize latency, so you pick the region that's closest to you. And then within the region, there's the idea of availability zones, which are basically just discreet, like physical locations of the servers that you administer them the same way, but they're protected.So like if a tornado runs through and hits one of your data centers. If you happen to have them distributed between two different availability zones, then you'll still be able to, you know, serve traffic. The other one will go down, but then the elastic load balancer will just notice that it's not responding and send the traffic to the other availability zone.So those are the main concepts that make it like EC2 those are what you need to use it effectively.Nate: [00:05:12] So with an easy to instance, that would be like a virtual server. I mean, it's not quite a Docker container, I guess we're getting to nuance there, but it's basically like a server that you would have like command line access to.You could log in and you can do more or less whenever you want on an EC2 instance.Steph: [00:05:29] Right, exactly. And so it used to be AWS used what was called Zen virtualization to do it. And that's just like you can run Zen on your own computer, you can get a computer and set up a virtual machine, almost just like they used to do it .So they are constantly putting out like new ways of virtualizing more efficiently. So they do have new technology now, but it's not something that was really, I mean, it was well known, but they really took it to a new kind of scale, which made it really impressive.Nate: [00:05:56] Okay, so EC2 lets you have full access to the box that's running and you might like load bounce requests against that.How does that contrast with what you do with AWS Lambda and serverless?Steph: [00:06:09] So with , you still have to, you know, either secure shell or, you know, furious and windows. Use RDP or something to actually get in there. You care about what ports are open. You have security groups for that. You care about all the stuff you would care about normally with a server you care about.Is it patched and up today you care about, you know, what's the current memory and CPU usage? All those things don't go away on EC2 just because it's cloud, right? When we start bringing serverless into the mix, suddenly. They do go and away. I mean, and there's still a few limitations. Like for instance, a Lambda has a limit on how much memory it can process with, just because they have to have a way to kind of keep costs down and define the units of them and define where to put them.Right? But at its core, what a Lambda is, it actually runs on a Docker container. You can think of it like a pre-configured Docker container with some pre-installed dependencies. So for Python, it would have. The latest version of Python that it says it has, it would have boto. It would have the stuff that it needs to execute that, and it would also have some basic, it's structured like it was, you know, basic Linux.So there's like a attempt. So slash temp you can write files there, right. But really it's like a Docker container. That runs underneath it on a fleet of . As far as availability zone distribution goes, that's already built into land, but you don't have to think about it with . You do have to think about it.Because if you only run one easy to server and it's in one availability zone, it's not really different from just having a physical server somewhere with a traditional provider.Nate: [00:07:38] So. There are these two terms, there's like serverless and Lambda. Can you talk a little bit about like the difference between those two terms and when to use each appropriately?Steph: [00:07:48] Yeah, so they are in a way sorta interchangeable, right? Because serverless technology just means the general idea of. I have an application, I have it defined it an artifact of we'll say zip from our get repo, right? So that application is my main artifact, and then I pass it to a service somewhere. I don't know.It could be at work. The Google app engine, that's a type of serverless technology and AWS Lambda is just the specific AWS serverless technology. But the reason AWS Lambda is, in my opinion so powerful, is because it integrates really well with the other features of AWS. So permissions management works with AWS Lambda API gateway.there's a lot of really tight integrations you can make with Lambda so that it doesn't, it's not like you have to keep half of your stuff one place and half of your stuff somewhere else. I remember when like Heroku was really big . A lot of people, you know, maybe they were maintaining an AWS account and they were also maintaining a bunch of stuff and Heroku, and they're just trying to make it work together.And even though Heroku does use, you know, AWS on the backend, or at least it did back then, it can just make things more complicated. But the whole server, this idea of the artifact is you make your code general, it's like a little microservice in a way. So I can take my serverless application and ideally, you know, it's just Python.I use NF, I write it the right way. Getting it to work on a different server. This back end, like for, exit. I think Azure has one, and Google app engine isn't really too much of a change. There's some changes to permissions and the way that you invoke it, but at the core of it, the real resource is just the application itself.It's not, you know, how many, you know, units of compute. Does it have, how many, you know, how much memory, what are the IP address rules and all that. YouNate: [00:09:35] know. So what are some good apps to build on serverless?Steph: [00:09:39] Yes. So you can build almost anything today on serverless, there's actually so much support, especially with AWS Lambda for integrations with all these other kinds of services that the stuff you can't do is getting more limited.But there is a trade off with cost, right? Because. To me the situation where it shines, where I would for no reason ever choose anything but serverless, is if you have something that's kind of bursty. So let's say you're making like a report generation tool that needs to run, but really you only run it three times a week or something like things that.They need to live somewhere. They need to be consistent. They need to be stable, they need to be available, but you don't know how often they're going to call. And even if they can go from that, there is small numbers of times it's being called, because the cool thing about serverless is , you're charged per every 100 milliseconds of time that it's being processed.When it comes to , you're charged and units that are, it used to be by the hour, I think they finally fixed it, and it's down to smaller increments. . But if you can write it. Efficiently. You can save a ton of money just by doing it this way, depending on what the use cases. So some stuff, like if you're using API gateway with Lambda, that actually can.Be a lot more expensive than Lambda will be. But you don't have to worry about, especially if you need redundancy. Cause otherwise you have to run a minimum of two two servers just to keep them both up for a AZ kind of outages situation. You don't have to worry about that with Lambda. So anything that with lower usage 100%.If it's bursty 100% use Lambda, if it's one of those things where you just don't have many dependencies on it, then Lambda is a really good choice as well. So there's especially infrastructure management, which is, if you look, I think Warner Vogels, he wrote something recently about how serverless driven infrastructure automation is kind of going to be the really key point to making places that are using cloud use cloud more effectively.And so that's one group of people. That's a big group of people. If you're a big company and you already use the AWS and you're not getting everything out of it that you thought you would get. Sometimes there's serverless use cases that already exist out there and like there's a serverless application repo that AWS provides and AWS config integrations, so that if you can trigger a serverless action based off of some other resource actions. So like, let's say that your auto scaling group scaled up and you wanted to like notify somebody, there's so many things you could do with it. It's super useful for that. But even if you're just, I'm co you're coming at it from like a blank slate and you want to create something .There are a lot of really good use cases for serverless. If you are, like I said, you're not really sure about how it's going to scale. You don't want to deal with redundancy and it fits into like a fairly well-defined, you know, this is pretty much all Python and it works with minimal dependencies. Then it's a really good starting place for that.Nate: [00:12:29] You know, you mentioned earlier that serverless is very good for when you have bursty services in that if you were to do it based on and then also get that redundancy one. You're going to have to run while you're saying you'll have to run at least two EC2 instances, just 24 hours a day. I'm paying for those.Plus you're also going to pay for API gateway. Do you pay hourly for API gatewaySteph: [00:12:53] API gateway? It, it would work the same either way, but you would pay for, in that case, like a load balancer.Nate: [00:12:59] What is API gateway? Do you use that for serverless?Steph: [00:13:02] All the time. So API gateway?Nate: [00:13:04] Yeah. Tell us the elements of a typical serverless stack.So I understand there's like Lambda, for example, maybe you say like you use CloudFront. In front of your Lambda functions, which may be store images and S3 that like a typical stack? And then can you explain like what each of those services are,Steph: [00:13:22] how you would do that? Yeah, so you're, you're not, you're on the right track here.So, okay. So a good way to think about it is, if you look at AWS has published something which a lot of documentations on it called the serverless application management standard. So S a N. And so basically if you look at that, it actually defines the core units of serverless applications. So which is the function, an API, if you, if you want one.And basically any other permission type resources. So in your case, let's say it was something where I just wanted like a really. Basic tutorial that AWS provides is someone wants to upload an image for their profile and you want to, you know, scale it down to like a smaller image before you store it on your S3.You know, just so they're all the same size and it saves you a ton, all that. So if you're creating something like that, the AWS resources that you would need are basically an API gateway, which is. Acts as basically the definition of your API schema. So like if you've ever used swagger or like a open API, these standards where you basically just define, and JSON, you know it's a rest API, you do get here, post here, this resource name.That's a standard that you might see outside of AWS a lot. And so API Gateway is just basically a way to implement that same standard. They work with AWS. So that's how you can think of API gateway. It also manages stuff like authentication integration. So if you want to enable OAuth or something on something, you could set that up the API gateway level.SoNate: [00:14:55] if you had API gateway set up. Then is that basically a web server hosted by Amazon?Steph: [00:15:03] Yeah, that's basically it.Nate: [00:15:05] And so then your API gateway is just assigned essentially randomly a DNS name by Amazon. If you wanted to have a custom DNS name to your API gateway. How do you do that?Steph: [00:15:21] Oh, it's just a setting.It's pretty. so what you could do, yeah, so if you already have a domain name, right? Route 53 which is AWS is domain name management service, you can use that to basically point that domain to the API gateway.Nate: [00:15:35] So you'd use route 53 you configure your DNS to have route 53 point a specific DNS name to your API gateway, and your API gateway would be like a web server that also handles like authentication and AWS integration. Okay,Steph: [00:15:51] got it. Yeah, that's a good breakdown of what that works. So that's your first kind of half of how people commonly trigger Lambdas. And that's not the only way to trigger it, but it's a very common way to do it. So what happens is when the API gateway is configured, part of it is you set what happens when the method is invoked.So there's like a REST API as a type of API gateway that. People use a lot. There's a few others, like a web socket, one which is pretty cool for streaming uses, and then they're always adding new options to it. So it's a really neat service. So you would have that kind of input go into your API gate.We would decide where to route it. Right. So in a lot of cases here, you might say that the Lambda function is where it gets routed to. That's considered the integration to it. And so basically API gateway passes it all of this stuff from the requests that you want to pass it. So, you know, I want to give it the content that was uploaded.I want to give it the IP address. It originally came from whatever you want to give it.Nate: [00:16:47] What backend would people use for API gateway other than Lambda? Like would you use an API gateway in front of an EC2 instance?Steph: [00:16:56] You could, but I would use probably a load balancer or application load balancer and that kind of thing.There's a lot of things you can integrate it for. Another cool one is, AWS API calls. It can proxy, so it can just directly take input from an API and send it to a specific API call if you want to do that. That's kind of some advanced usage, but Lambdas are kind of what I see is the go-to right now.Nate: [00:17:20] So the basic stack that we're looking at is you use API gateway to be in front of your Lambda function, and then your Lambda function just basically does the work, which is either like a writing to some sort of storage or calculating some sort of response. You mentioned earlier, you said, you know the Lambda function it can be fronted by an API if you want one. And then you mentioned, you know, there's other ways that you can trigger these Lambda functions. Can you talk a little bit about like what some of those other ways are?Steph: [00:17:48] Yeah, so actually those are really cool. So the cool thing is that you could trigger it based off of basically any type of CloudWatch event is a big one.And so CloudWatch is basically a monitoring slash auditing kind of service that AWS provides. So you can set triggers that go off when alarms are set. So. It could be usage, it could be, Hey, somebody logged in from an IP address that we don't recognize. You could do some really cool stuff with CloudWatch events specifically. And so those are one that I think for like management purposes are really powerful to leverage. But also you can do it off of S3 events, which are really cool. So like you could just have it, so anytime somebody uploads a. Let's say it was a or CI build, right? You're doing IA builds and you're putting your artifacts into a S three bucket, so you know this is released version 1.1 or whatever, right?You put it into an S3 bucket, right? You can hook it up so that when ever something gets put into that S3 bucket. That another action is that takes place so you can make it so that, you know, whenever we upload a release, then you know, notify these people. So now an email or you can make it so that it, you know, as complicated as you want, you can make it trigger a different kind of part in your build stage.If you have things that are outside of AWS, you can have it trigger from there. There's a lot of really cool, just direct kind of things that you don't need. An API for. An S3 is a good one. The notification service, SNS it's used within AWS a lot that can be used. The queuing service AWS provides called SQS.It works with, and also just scheduled events, which I really like because it replaces the need for a crown server. So if you have things that run, you know, every Tuesday or whatever, right, you can just trigger your Lambda to do that from just one configuration point, you don't have to deal with anything more complicated than that.Nate: [00:19:38] I feel like that gives me a pretty good grounding in the ecosystem, in the setting. Maybe we could talk a little bit more about tools and tooling. Yeah, so I know that in the JavaScript world, on like the node world, they have the serverless framework, which is basically this abstraction over, I think it's over like Lambda and you know, Azure functions and Google up.Google cloud probably too. Do they have like a serverless framework for Python or is there like a framework that you end up using? Or do you just generally just write straight on top of Lambda?Steph: [00:20:06] So that's a great question and I definitely do recommend that even though there is like a front end you could do to just start, you know, typing code in and making the Lambda work right.It's definitely better to have some sort of framework that. Integrates with your actual, like, you know, wherever you use to store your code and test it and that kind of thing. So serverless is a really big one, and that's, it's kind of annoying because serverless, you know, it also refers to the greater ecosystem of code that runs without managing underlying servers.But in this particular case, Serverless is also like a third party company in tooling, and it does work for Python. It works for, a whole budget. That's kind of like the serverless equivalent in my head of like Terraform, something that is kind of meant to be kind of generic, but it offers a lot of kind of value to people just getting started. If you just want to put something in your, read me that says, here's how to, you know, deploy this from Github. You know, serverless is a cool tool for that. I don't personally like building with it myself just because I find that this SAM, which is Serverless Application Model, I think I said management earlier, but it's actually model.I just looked that up. I feel like that has everything I really want for AWS and I get more fine grain control. I don't like having too much obstruction and I also don't like. When you have to install something and something changes between versions and that changes the way your infrastructure gets deployed.That's a pet peeve of mine, and that's why I don't really use Terraform too much for the same reason. When you're operating really in one world, which in my case is AWS, I just don't get the value out of that. Right. But with the serverless application model, and they have a whole Sam CLI, they have a bunch of tools coming out.So many examples on their Github repos as well. I find that it's got really everything. I want to use plus some CloudFormation plugs right into that. So if I need to do anything outside of the normal serverless kind of world, I can do that. So it's better to use serverless than to not use anything at all. I think it's a good tool and really good way to kind of get used to it and started, but at least my case where it really matters to have super consistent deployments where I'm sharing between people and accounts and all of that. And I find that SAM really gives me the best kind of best of both worlds.Amelia: [00:22:17] So, as far as I understand it, serverless is a fairly new concept.Steph: [00:22:22] You know, it's one of those things it's catching on. Recently, I felt like Google app engine candidate a long time ago, and it was kind of a niche thing for awhile, but it recently it, we're starting to see. Bigger enterprises, people who might not necessarily want bleeding edge stuff start to accept that serverless is going to be the future.And that's why we're seeing all this stuff come up and it's, it's actually really exciting. But the good thing is it's been around long enough that a lot of the actual tooling and the architecture patterns that you will use are mature. They've been used for years. Their sites you've been using for a long time that.You don't know that it's serverless on the back end, but it is because it's one of those things that doesn't really affect you unless you're kind of working on it. Right. But it's new to a lot of people, but I think it's in a good spot where it's more approachable than it used to be.Nate: [00:23:10] When you say that there's like a lot of standard patterns, maybe we could talk about some of those.So when you write a Lambda function and code, either with like Python or Java script or whatever, there are bloods, they say Python because you use Python primarily right? Well, maybe we could talk a little bit about that. Like why do you prefer Python?Steph: [00:23:26] Yeah, so just coming from my background, which is, like I said, I did some support, did some straight dev ops, kind of a more assisted mini before the world kind of became a more interesting place kind of background.Python is just one of those tools that is installed on like every Linux server in the world and works kind of predictably. Enough people know it that it's, it's not too hard to like. Share between people who may not be, you know, super advanced developers, right? Cause a lot of people I work with, maybe they have varying levels of skills and Python's one of those ones you can start to pick up pretty quickly.And it's not too foreign really to people coming from other languages as well. So it's just a practicality thing for a lot of it. But there's also a lot of the tooling that is around. Dev ops type stuff is in Python, like them, Ansible for configuration management, super useful tool. You know, it's all Python.So really there's, there's a lot of good reasons to use Python from, like in my world it's, it's one of the things where you don't have to use one specific language, but Python is just, it has what I need and it gets, I can work with it pretty quickly. The ecosystems develop. There's still a lot of people who use it and it's a good tool for what I have to do.Nate: [00:24:35] Yeah, there's tons, and I was looking at the metrics. I think Python is like, last year was like one of the fastest growing programming languages too. There's a lot of new people coming into Python,Steph: [00:24:44] and a lot of it is data science people too, right? People who may not necessarily have a strong programming background, but there's the tooling they need in a Python already there.There's community, and it sucks that it's not as scary looking as some other languages, frankly. You know.Nate: [00:24:58] And what are some of the other like cloud libraries that Python has? Like I've seen one that's called like BotoSteph: [00:25:03] Boto is the one that Amazon provides as their SDK, basically. And so every Lambda comes bundled with Boto three you know, by default.So yeah, there was an older version of ODA for Python too. But Boto three is the main one everyone uses now. So yeah, Bodo is great. I use it extensively. It's pretty easy to use, a lot of documentation, a lot of examples, a lot of people answering questions about it on StackOverflow, but I'm really, every language does have an SDK for AWS these days, and they all pretty much work the same way because they're all just based off of.The AWS API APIs and all the API APIs are well-defined and pretty stable, so it's not too much of a stretch to use any other language, but Bono's the big one, the requests library in Python is super useful just because it makes it easier to deal with, you know, interacting with API APIs or interacting with requests to APIs.It's just all about, you know, HTP requests and all that. Some of the new Python three. Libraries are really good as well, just because they kind of improve. It used to be like with Python 2, you know, there's URL lib for processing requests and it was just not as easy to use. So people would always bundle a third party tool, like requests, but it's getting better.Also, you know, Python, there's some. Different options for testing Py unit and unit test, and really there's just a bunch of libraries that are well maintained by the community. There's a kazillion on PyPy, but I try to keep outside dependencies from Python to a total minimum because again, I just don't like when things change from underneath me, how things function.So it's one of the things where I can do a lot without. Installing third party libraries, so wherever I can avoid it, I do.Nate: [00:26:47] So let's talk a little bit about these patterns that you have. So Lambda functions generally have a pretty well defined structure, and it's basically through that convention. It makes it somewhat straightforward to write each function. Can you talk a little bit about like, I don't know, the anatomy of a Lambda function?Steph: [00:27:05] Yeah, so at its basic core, the most important thing that every Lambda function in the world is going to have is something called a handler. And so the handler is basically a function that is accessed to begin the way that it starts.So, any Lambda function when it's invoked. So anytime you are calling it, it's called invoking a Lambda function. It sends it parameters that are event. An event is basically just the data that defines, Hey, this is stuff you need to act on. And it sends it something called context, which a lot of time you never touched the context object.But it's useful too, because AWS provides it with every Lambda and it's basically like, Hey, this is the ID of the currently running Lambda function. You know, this is where you're running. This is the Lambdas name. So like for logging stuff, context can be really useful. Or for stuff where it's like your function code may need to know something about where it is.You can save yourself time from, you don't have to use like an environment. They're able, sometimes if you can look in the context object. So at the core it's cause you have at least a file, you can name it whatever you want. A lot of people call it index and then within that file you define a function called handler.Again, it doesn't have to be called handler, but. That makes it easy to know which one it is, and it takes that event and context. And so really, if that's all you have, you can literally just have your Lambda file be one Python file that says, you can say def handler takes, you know, object and then return something.And then that can be it. As long as you define that index dot handler as your handler resource, which is, that's a lot of words, but basically we need to find your Lambda within AWS. The required parameters are basically the handler URI, which is the name of the file, and then a.in the name of the handler function.So that's at its most basic. Every Lambda has that, but then you start, you know, scoping it out so you can actually know, organize your code decently. And then it's just a matter of, is there a read me there. Just like any other Python application really, you know, do you have a read me? Do you want to use like a requirements.txt file to like define, you know, these are the exact third party libraries that I'm going to be using.That's really useful. And if you're defining it with SAM, which I really recommend. Then there's a file called template.yaml And that's just contains the actual, like AWS resource definition, as well as any like CloudFormation defined resources that you're using. So you can make a template.yaml as the infrastructure kind of as code, and then everything else, just the code as code.Nate: [00:29:36] Okay. So unpacking that a little bit, you'll invoke this function and they'll basically be two arguments. One is the event that's coming in the event in particular, and then it'll also have the context, which is sort of metadata about the context in which this request is running. So you mentioned some of the things that come in the context, which is like what region you're in or what the name of the function is that you're on.What are some of the parameters in the event object.Steph: [00:30:02] So the interesting thing about the event object. Is, it can be anything. It just has to be basically a Python dictionary or basically, you know, you could think of it like a JSON, right? So it's not predefined and Lambda itself doesn't care what the event is.That's all up to your code to decide what is it, what is a valid event, and how to act on it. So API gateway if you're using that. There's a lot of example events, API gateway will send and so if you like ever try it, look at like the test events for Lambda, you'll see a lot of like templates, which are just JSON files with like expected outputs.But really it can be anything.Nate: [00:30:41] So the way that Lambda's structured is that API gateway will typically pass in an event that's maybe like the request was a POST request, and then it has these like query parameters or headers attached to it. And all of that would be within like the request object. But the event could also be like you mentioned like CloudWatch, like there's like a CloudWatch event that could come in and say, you basically just have to configure your handler to handle any of the events you expect that handler to receive.Steph: [00:31:07] Yeah, exactly.Nate: [00:31:09] So let's talk a little bit more about the development tooling. How in the world do you test these sorts of things? Like with, do you have to deploy every single time or tell us about like the development tooling that you use to test along the way.Steph: [00:31:22] Yeah. So I'm, one of the great things about SAM and there's some other tools for this as well, is that it lets you test your Lambdas locally before you deploy it, if you want.And the way that it does that is, I mentioned earlier that Lambda is really at its core, a container, like a Docker container running on a server somewhere. Is, it just creates a Docker container that behaves exactly like a Lambda would, and it sends your events. So you would just define basically a JSON with the expected data from either API gateway or whatever, right?You make a test one and then it will send it to that. It'll build it on demand for you and you test it all locally with Docker. When you like it, you use the same tool and it'll package it up and deploy it for you. So yeah, it's actually not too bad to test locally at all.Nate: [00:32:05] So you create JSON files of the events that you want it to handle, and then you just like invoke it with those particular events.Steph: [00:32:12] Yeah, so basically like if I created it like a test event, I would save it to my repo is tests slash API gateway event.json Had put in the data I expect, and then I would do like a SAM. So the command is like SAM, a local invoke, and then I would give it to the file path to the JSON, and it would process it.I'd see the exact same output that I would expect to see from Lambda. So it'll say, Hey, this took this many milliseconds to invoke the response code was this, this is what was printed. So it's really useful just for. It's almost a one to one with what you would get from Amazon Lambda output.Amelia: [00:32:50] And then to update your Lambda functions.Do you have to go inside the AWS GUI or can you do that from the command line.Steph: [00:32:57] yeah, no, you can do that from the command line with Sam as well. So there's a Sam package and Sam deploy command. It's useful if you need to use basically any type of CII testing service to like manage your deployments or anything like that.Then you can get a package it and then send it the package to your, Whatever you're using, like Gitlab or something, right. For further validation and then have Gitlab deploy it. Like if you don't want people to have deployed credentials on their local machine, that's the reason it's kind of broken up into two steps there.But basically you just do a command, Sam deploy, and what it does is it goes out to Amazon. It says, Hey, update the Lambda to point to this as the new resource artifact to be invoked. And if you're using and which I think it's enabled by default, not actually the versioning feature, it actually just adds another version of the Lambda so that if you need to roll back, you can just go to the previous one, which is really useful sometimes.Nate: [00:33:54] So let's talk a little bit about deployment. One of the things that I think is stressing when you are deploying Lambda functions is like, I have no idea how much it's going to cost. How is it going to cost to launch something, and how much am I going to pay? And I guess maybe you can kind of calculate if you estimate the number of requests you think you're going to get, but how do you approach that when you're writing a new function?Steph: [00:34:18] Yeah, so the first thing I look at is what's the minimum, basically timeout, what's the minimum memory usage? So number of invocations is a big factor, right? So like if you have free tier, I think it's like a million invocations you get, but that's like assuming like a hundred under a hundred milliseconds each.So when you just deploy it, there's no cost for just deploying it. You don't get charged until it's invoked. If you're storing like an artifact and as three, there's a little cost for you keeping it in as three. But it's usually really, really minimal. So the big thing is really, how many times are you give it?Is it over a million times and or are you not on free tier? The costs, like I said, it gets batchedtogether and it's actually really pretty cheap just in terms of number of invocations cause at the bigger places where you can normally save costs. Is it over-provisioned for how much memory you give it?Right. I think the smallest unit you can give it as 128 that can go up to like two gigabytes maybe more now. So if you have it set where, Oh, I want it to use this much memory and it really never is going to use that much memory and that's kind of like wasteful or if you know, if it uses that much, that's like something's wrongNate: [00:35:25] cause you pay, you configure beforehand, like we're going to use max 128 megabytes of memory and then it's allocated on invocation or something like that.And then if you set it too high, you were going to pay more than you need to. Is that right?Steph: [00:35:40] Yeah. Well and it's more like, I think I'll have to double check cause it actually just show you how much memory you use each time in Lambda is invoked. So you can sort of measure if it's getting near that or if you think you need more than it might give an error.If it doesn't, it isn't able to complete . But in general, like. I haven't had many cases where the memory has been the limiting factor. I will say that, the timeout can sometimes get you, because if a Lambda's processing forever, like let's say API gateway, a lot of times API gateway has its own sort of timeout, which is, I think it's like 30 seconds to respond.And if your Lambda is set to, you know, you give it five minutes to process it always five minutes processing. If you, let's say that you program something wrong and there's like a loop somewhere and it's going on forever, it'll waste five minutes. Computing API gateway will give up after 30 seconds, but you'll still be charged for the five minutes that Lambda was kind of doing its thing.SoNate: [00:36:29] it's like, I know that AWS is services and Lambda are created by like world-class engineers. It's the highest performing infrastructure probably in the world, but as a user, sometimes it feels like there's giant Rube Goldberg machine, and I have like no idea. All of the different aspects that are involved in, like how do you manage that complexity?Like when you're trying to learn AWS, like let's say someone who's listening to this, they want to try to understand this. How do you. Go about making sense of all of that. Like difficulty.Steph: [00:37:02] You really have to go through a lot of the docs, like videos, people showing you how they did something isn't always the best just because they kind of skirt around all the things that went wrong in the process, right? So it's really important just to understand, just to look at the documentation for what all these features are before you use them. The marketing people make it sound like it's super easy and go, and to a degree, it really is like, it's easier than the alternative, right?It's where you put your complexities the problem Nate: [00:37:29] yeah, and I think that part of the problem that I have with their docs is like they are trying to give you every possible path because they're an infrastructure provider, and so they support like these very complex use cases. And so it's, it's like the world's most detailed choose your own adventure.It's like, Oh, have you decide that you need to take this path? Go to or this one path B. Path C there's like so many different like paths you can kind of go down. It's just a lot when you're first learning.Steph: [00:37:58] It is, and sometimes like the blog posts have better kind of actual tutorial kind of things for like real use cases.So if you have a use case that is general enough, a lot of times you can just Google for it and there'll be something that one of their solution architects wrote up about had actually do it from like a, you know, user-friendly perspective that anything with the options is that you need to be aware of them too, just because the way that they interact can be really important.If you do ever do something that's not done before and the reason why it's so powerful and what, you know why it takes all these super smart people to set up and all this stuff is actually because are just so many variables that go into it that you can do so much with that. It's so easy to shoot yourself in the foot.It always has been in a way, right? But it's just learning how to not shoot yourself in the foot and use it like with the right agility. And once you get that down, it's really great.Amelia: [00:38:46] So there's over a hundred AWS services. How do you personally find new services that you want to try out or how does anyone discover any of these different services.Steph: [00:38:57] What I do is, you know, I get the emails from AWS whenever they release new ones, and I try to, you know, keep up to date with that. Sometimes I'll read blog posts that I see people writing about how they're using some of them, but honestly, a lot of it's just based off of when I'm doing something, I just keep an eye out.If there's something like, I wished that it did sometimes, like, I used some AWS systems manager a lot, which is basically. You can think of it. It's sort of like a config management an orchestration tool. It lets you, basically, it's a little agent. You can sell on servers and you can, you know, just automate patching and all this other like little stuff that you would do with like Chef or Puppet or other config management tools.And. It seems like they keep announcing services. What are really just like tie ins to existing ones, right? Which is like, Oh, this one adds, you know, for instance, like the secret management and the parameter store would secrets. A lot of them are really just integrations to other AWS services, so it's not as much.The really core ones that everyone needs to know is, you know, EC2 of course Lambda, so big API gateway and CloudFormation because it's basically. The infrastructure as code format that is super useful just for structuring everything. And I guess S3 is the other one. Yeah. Let's talk aboutNate: [00:40:15] cloud formation for a second.So earlier you said your Lambda function is typically going to have a template.yaml. Is that template.yaml CloudFormation code.Steph: [00:40:26] So at its core, yes. But the way you write it is different. So how it works is that the Sam templating language is defined to simplify. What you would with CloudFormation.So a CloudFormation you have to put a gazillion variables in.And it's like, there's some ways to like make that easier. Like I really like using a Python library called Tropo sphere, where you can actually use Python to generate your own cloud formation templates for you. And it's really nice just cause, you know, I like to know I'll need a loop for something or I'll need to like fetch a value from somewhere else.And it's great to have that kind of flexibility with it . The, the Sam template is specifically a transform, is what they call it, of cloud formation, which means that it executes against the CloudFormation service. So the CloudFormation service receives that kind of turns it into the core that it understands and executes on it.So at the core of it, it is executing on top of CloudFormation. You could create a mostly equivalent kind of CloudFormation template usually, but there's more to it. But there's a lot of just reasons why you would want to use Sam for serverless specifically, just because they add so many niceties and stuff around, you know, permissions management that you don't have to like think of as much and shortcuts and it's just a lot easier to deal with, which is a nice change.But the power of CloudFormation is that if you wanted to do something. That like maybe SAM didn't support the is outside the normal scope. You could just stick a CloudFormation resource definition in it and it would work the same way cause it's running against it. It's one of those services where people, sometimes it gets a bad rap because it's so complicated, but it's also super stable.It behaves in a super predictable way and it's just, I think learning how to use that when I worked at AWS was really valuable.Nate: [00:42:08] What tools do you use to manage the security when you're configuring these things? So earlier you mentioned IAM, which is a, I don't know what it stands for.Steph: [00:42:19] Identity and access management,Nate: [00:42:20] right?Which is like configuration language or configuration that we can configure, which accounts have access to certain resources. let me give you an example. One question I have is how do you make sure each system has the minimum level of permissions and what tools you use? So for example, I wrote this Lambda function a couple of weeks ago.Yeah. I was just following some tutorial and they said like, yeah, make sure that you create this IAM role as like one of the resources for launching this Lambda function, which I think they're like, that's great. But then like. How do I pin down the permissions when I'm granting that function itself permissions to grant new IAM roles. So it was like I basically just had to give it route permission according to my low, my skill level, because otherwise I wasn't able to. Create, I am roles without the authority to create new roles, which just seems like root permissions.Steph: [00:43:13] Yes. So there are some ways that's super risky, honestly, like super risky.Nate: [00:43:17] Yeah. I'm going to need your help,Steph: [00:43:19] but it is a thing that there are case you can, you can limit it down with the right kind of definition. SoIAM. It's really powerful. Right? So the original case behind a MRI was that, so you're a servers so that if you had a, an application server and a database server separately.You could give them separate IAM roles so that they could have different things they could do. Like you never really want your database server to maybe. Interface directly with, you know, an S three resource, but maybe you want your application server to do that or something. So it was nice because it really let you limit down the scope from a servers and you don't, cause you have to leave keys around if you do it .So you don't have to keep keys anywhere on the server if you're using IAM roles to access that stuff. So anytime you're storing like an AWS secret key on a server, or like in a Lambda, you kinda did something wrong. The thing they are just because that's, AWS doesn't really care about those keys. It just looks, is it a key?Do it here. But when you actually use IAM policies, you could say it has to be from this role. It has to be executed from, you know, this service. So it can make sure that it's Lambda or the one doing this, or is it somebody trying to assume Lambda credentials? Right? There's so much you can do to kind of limit it.With I am. So it was really good to like learn about that. And like all of the AWS certifications do focus on IAM specifically. So if anyone thinking about taking like an AWS certification course, a lot of them will introduce you to that and help a lot with understanding like how to use those correctly.But for what you talked about with you, like how do you deal with a function that passes, that creates a role for another function, right? What you would do in that kind of case is there's an idea of IAM paths. So basically you can give them like as namespacing for IAM permissions, right? So you can make a, I am role that can grant functions that can create roles .Only underneath its own namespace. Within its own path.Nate: [00:45:20] When you say namespaces, I mean did inherit permissions. But the parent permission has?Steph: [00:45:28] Depends. So it doesn't inherit itself. But like, let's say that I was making a build server . And my build server, we had to use a couple of different roles for different pieces of it. For different steps. Cause they used different services or something. So we would give it like the top level one of build. And then in my S3 bucket, I might say aloud upload for anyone whose path had built in it. So that's, that's the idea that you can limit on the other side, what is allowed.And so of course, it's one of the things where you want to by default blacklist as much as possible, and then white list what you can. But in reality it can be very hard to go through some of that stuff. So you just have to try to, wherever you can, just minimize the risk potential and understand what's the worst case that could happen if someone came in and was able to use these credentials for something.Amelia: [00:46:16] What are some of the other common things that people do wrong when they're new to AWS or DevOps?Steph: [00:46:22] One thing I see a lot is people treating the environment variables for Lambdas as if they were. Private, like secrets. So they think that if you put like an API key in through the environment variable that that's kind of like secure, but really like I worked in AWS support, anyone would be able to see that if they were helping you out in your account.So it's not really a secure way to do that. You would need to use a surface like secrets manager, or you'd have some kind of way to, you would encrypt it before you put it in and then the Lambda would decrypt it, right? So there's ways to get around that, but like using environment variables as if there were secure or storing.Secure things within your git repositories that get pushed to AWS is like a really big thing that should be avoided. And we said, what else did you ever own?Nate: [00:47:08] I'm pretty sure that I put an API key in mineSteph: [00:47:11] before. So yeah, no, it's one of the things people do, and it's one of those things that. A lot of people, you know, maybe nothing will go wrong and it's fine, but if you can just reduce the scope, then you don't have to worry about it.And it just makes things easier in the future.Amelia: [00:47:27] What are like the new hot things that are up and coming?Steph: [00:47:30] So I'd say that there's more and more kind of uses for Lambda at edge for like IOT integration, which is pretty cool. So basically Lambda editor. Is basically where you can process your lamb dos computers, basically, like, you know, like, just think of it as like raspberry pi.It's like that kind of type thing, right? So you could take asmall computer and you could put it like, you know, maybe where it doesn't have a completely like, consistent internet connection . So maybe if you're doing like a smart vending machine or something. Think of it like that. Then you could actually execute the Lambda logic directly there and deploy it to there and manage it from AWS whenever it does have like a network connection and then you can basically, it just reduces latency.A lot and let your coat and lets you test your code both like locally and then deploy it out. So it was really cool for like IOT stuff. There's been a lot of like tons of stuff happening in machine learning space on AWS too much for me to even keep on top of. But a lot of the stuff around Alexa voices is super cool, like a poly where you can just, if you play with your Alexa type thing before, it's cool, but you could just write a little Lambda program to actually generate, you know, whatever you want it to say in different accents, different voices on demand, and integrate it with your own thing, which is pretty cool. Like, I mean, I haven't had a super great use case for that yet, but it's fun to play with.Amelia: [00:48:48] I feel like a lot of the internet of things are like that.Steph: [00:48:52] Oh, they totally are. That they really are. But yeah, it's just one of the things you had to keep an eye out for. Sometimes the things that, for me, because I'm dealing so much with like enterprisey kind of stuff that excite me are not really exciting to other people cause it's like, yay, patching has a way to like lock it down to a specific version of this at this time.You know, it's like, it's like, it's not really exciting, but like, yeah.Nate: [00:49:14] And I think that's one of the things that's interesting talking to you is like I write web apps, I think of serverless from like a web app perspective, but it's like, Oh, I'm going to write an API that will let her know, fix my images on the way up or something.But a lot of the uses that you alluded to are like using serverless for managing, other parts of your infrastructure, they're like, you're using, you've got a monitor on some EC2 instance that sends out a cloud watch alert that like then responds in some other way, like within your infrastructure.So that's really interesting.Steph: [00:49:47] Yeah, no, that's, it's just been really valuable for us. And like I said, I mentioned the IAM stuff. That's what makes it all possible really.Amelia: [00:49:52] So this is totally unrelated, but I'm always curious how people got into DevOps, because I do a lot of front end development and I feel like.It's pretty easy to get into front end web development because a lot of people need websites. It's fairly easy to create a small website, so that's a really good gateway, but I've never like on the weekend when it to spin up a server or any of this,Steph: [00:50:19] honestly for me, a lot of it was like my first job in college.Like I was basically part-time tech support / sys admin. And I always loved L nuxi because, and the reason I got into Lennox in the first place is I realized that when I was in high school that I could get around a lot of the schools, like, you know, spy software that won't let you do fun stuff on the internet or with the software if you just use a live boot Linux USB.So part of it was just, I was using it. So, you know. Get around stuff, just curiosity about that kind of stuff . But when I got my first job, that's kind of like assist admin type thing. It kind of became a necessity. Because you know when you have limited resources, it was like me and like another part time person and one full time person and hundreds of people who we had to keep their email and everything.Working for them. It kind of becomes a necessity thing cause you realize that all the stuff that you have to do by hand back then, you can't keep track of it all. You can't keep it all secured for a few people. It's extremely hard. And so one way people dealt with that was, you know, offshoring or hiring people, other people to maintain it.But it was kind of cool at the time to realize that the same stuff I was learning in my CS program about programming. There's no reason I couldn't use that for my job, which was support and admin stuff. So, I think I got introduced to like chef, that was the first tool that I really, I was like, wow, this changes everything.You know, because you would write little Ruby files to do configuration management and then your servers would, you know, you run the chef agent to end, you know. You know, they'd all be configured exactly the same way. And it was testable. And there's all this really cool stuff you could do with chef that I, you know, I had been trying to play to do with like, you know, bash script or just normal Python scripts.But then chef kind of gave me that framework for it. And I got a job at AWS where one of the main components was supporting their AWS ops work stool, which was basically managed chef deployments. And so that was cool because then I learned about how does that work at super high scale. What are other things that people use?And right before I actually, you know, got my first job as a full time dev ops person was when they, they were releasing the beta for Lambda. So I was in the little private beta for AWS employees and we were all kind of just like, wow, this changes a lot. They'll make our jobs a lot easier, you know, in a way it will reduce the need for some of it.But we were so overloaded all the time. And I feel like a lot of people from a perspective know what it feels like to be like. There's so much going on and you can't keep track of it all and you're overloaded all the time and you just want it to be clean and not have to touch it and to do less work at dev ops was kind of like the way forward.So that's really how I got into it.Amelia: [00:52:54] That's awesome. Another thing I keep hearing is that a lot of dev ops tests are slowly being automated. So how does the future of DevOps look if a lot of the things that we're doing by hand now will be automated in the future?Steph: [00:53:09] Well, see, the thing about dev ops is really, it's more of like a goal.It's an ideal. A lot of people, if they're dev ops purists and they'll tell you that it means it's having a culture where. There are not silos between developers and operations, and everyone knows how to deploy and everyone knows how to do everything. But really in reality, not everyone's a generalist.And being a generalist in some ways is kind of its own specialty, which is kind of how I feel about the DevOps role that you see. So I think we'll see that the dev ops role, people might go by different names for the same idea, which is. Basically reliability engineering, like Google has a whole book about site reliability engineering is the same kind of philosophy, right? It's you want to keep things running. You want to know where things are. You want to make things efficient from an infrastructure level. But the way that you do it is you use a lot of the same tools that developers use. So I think that we'll see tiles shift to like serverless architect is a big one that's coming up because that reliability engineering is big.And we may not see people say dev ops is their role as much, but I don't think the need for people who kind of specialize in like infrastructure and deployment and that kind of thing is going to go away. You might have to do more with less, right? Or there might be certain companies that just hire. A bunch of them, like Google and Amazon, right?They're pro still going to be a lot of people, but maybe they're not going to be working at your local place because if they're going to be working for the big people who actually develop the tools that are used for that resource. So I still think it's a great field and it might be getting a little harder to figure out where to enter in this because there's so much competition and attention around the tools and resources that people use, but it's still a really great field overall. And if you just learn, you know, serverless or Kubernetes or something that's big right now, you can start to branch out and it's still a really good place to kind of make a career.Nate: [00:54:59] Yeah. Kubernetes. Oh man, that's a whole nother podcast. We'll have to come back for that.Steph: [00:55:02] Oh, it is. It is.Nate: [00:55:04] So, Steph, tell us more about where we can learn more about you.Steph: [00:55:07] Yeah. So I have a book coming out.Nate: [00:55:10] Yes. Let's talk about the book.Steph: [00:55:12] Yeah. So I'm releasing a book called, Fullstack Serverless. See, I'm terrible.I should know exactly what the title, I don'tNate: [00:55:18] know exactly the title. . Yeah. Full stack. Python with serverless or full-stack serverless with Python,Steph: [00:55:27] full stack Python on Lambda.Nate: [00:55:29] Oh yeah. Lambda. Not serverless.Steph: [00:55:31] Yeah, that's correct. Python on Lambda. Right. And that book really has, it could take you from start to finish, to really understand.I think if you read this kind of book, if I, if I had read this before, like learning it, it wouldn't feel so maybe. Some people confusing or kind of like it's a black box that you don't know what's happening. Cause really at its core lambda that you can understand exactly everything that happens. It has a reason, you know it's running on infrastructure that's not too different from people who run infrastructure on Docker or something.Right. And the code that you write. Can be the same code that you might run on a server or on some other cloud provider. So the real things that I think that the book has that maybe kind of hard to find elsewhere is there's a lot of information about how do you do proper testing and deployment?How do you. Manage your secrets, so you aren't storing those in them in those environment variables. Correct. It has stuff about logging and monitoring, all the different ways that you can trigger Lambda. So API gateway, you know, that's a big one. But then I mentioned S3 and all those other ones. there's going to be examples of pretty much every way you can do that in that book.Stuff about optimizing cost and performance and stuff about using that. SAM, serverless application, a repository, so you can actually publish Lambdas and share them and even sell them if you want to. So it's really a start to finish everything you need to. If you want to have something that you create from scratch.In production. I don't think there's anything left out that you would need to know. I feel pretty confident about that.Nate: [00:57:04] It's great. I think one of the things I love about it is it's almost like the anti version of the docs, like when we talked about earlier that the docs cover every possible use case.This talks about like very specific, but like production use cases in a very approachable, like linear way. You know, even though you can find some tutorials online, maybe. Like you mentioned, they're not always accurate in terms of how you actually do or should do it, and so, yeah, I think your book so far has been really great in covering these production concerns in a linear way.All right. Well, Steph is great to have you.Steph: [00:57:37] Thank you for having me. It was, it was great talking to you both.
This episode is all about application integration, which on AWS is a suite of services that enable communication between decoupled components within microservices, distributed systems, and serverless applications. Products in this category include SQS, SNS and Kinesis which I will discuss in this episode - for more info check out the AWS documentation here https://aws.amazon.com/products/application-integration
Le message queue est aujourd'hui au coeur de toutes nos chères applications en micro services, permettant ainsi à chaque service de s'exécuter en toute autonomie sans être couplé à son voisin. Mais ce n'est pas son seul cas d'usage, loin s'en faut, tant le monde d'aujourd'hui tourne autour des évènements.Or quand on pense message queue, bien souvent on pense à Kafka, ou peut-être aussi à RabbitMQ ou SQS. Mais connaissez-vous NATS ? NATS est un projet open source qui a maintenant plus de 10 ans, et qui depuis longtemps prouvé sa robustesse dans des produits comme Pivotal Container Service, entre autres. Par ailleurs, NATS est aujourd'hui hébergé par la CNCF, ce qui est, de mon point de vue, un gage de qualité indéniable.Dans cet épisode, j'ai le plaisir d'accueillir Ivan Kozlovic. Ivan est ingénieur software chez Synadia, la société qui maintient et dirige le développement de NATS. Avec lui, nous allons en apprendre un peu plus sur le message queuing, et des challenges que NATS relève dans ce domaine.Notes de l'épisodeSynadia: https://synadia.com/NATS.io : https://nats.io/Le repository GitHub : https://github.com/nats-ioLa chaîne Youtube de NATS : https://www.youtube.com/c/nats_messaging/videosUne video décrivant le cas d'utilisation de NATS par une compagnie d'électricité : https://youtu.be/YB-zPHxrJ6k?t=979Practical NATS : https://www.amazon.fr/Practical-NATS-Beginner-Pro-English-ebook/dp/B07DLTN6PK/ref=sr_1_1Support the show (https://www.patreon.com/electromonkeys)
This is for all those who recognise they are happy at a senior delivery role - SQS, SPM or SBS, but want to know how to maximise their position --- Send in a voice message: https://anchor.fm/pegasusradio/message
In this Episode of AWS TechChat, Pete and Shane are in Chicago and continue on part 2 of an update show that continues to cover some of the missed but very important updates that occurred in the last few months (November 2019 → January 2020) whilst we embraced re:Invent 2019. We start the show with some Container news. Firstly, we have four GitHub actions that provide hooks to accelerate your CI/CD pipeline. The actions relate to credentials, secrets, through to ECR and deployment. It helps developers focus on iterating with a high velocity and GitHub handling the heavy lifting of the deployment. Amazon EKS being all popular has had a limit increase - 100 Amazon EKS clusters per region per account. We continue to share two networking related updates - Access Control List (ACL) restrictions to public endpoints and the ability to resolve the private Amazon EKS cluster endpoint when using a peered VPC. Finally, on the container front we launched AWS Fargate Spot, only for Amazon ECS allowing saving up to 70%. Amazon EC2 Spot Now Provides Instance Launch Notifications via Amazon CloudWatch Events allowing you to up your observability and monitoring game. We then pivot to messaging updates. Amazon Simple Email Service (SES) enables you to configure DomainKeys Identified Mail (DKIM) using your own RSA key pair. It is now supports FIPS 140-2 compliant end-points and Account-Level Suppression list which allowing you to specify whether addresses should be added to the list when they result in hard bounces, or when they result in complaints, or both. Amazon SNS now brings support for a DLQ. You can now set a dead-letter queue (DLQ) to an (SNS) subscription to capture undeliverable messages and push them to a SQS queue. AWS Lambda now provides support to allow you to provision capacity, allowing you to prevent cold starts and is another tool in your toolbox that may make AWS Lambda more applicable to more workloads that require highly consistent latency. Lastly, we close off the show with Amazon EBS Fast Snapshot Restore (FSR) update. It eliminates the need for pre-warming data into volumes created from snapshots. Speakers: Shane Baldacchino - Solutions Architect, ANZ, AWS Peter Stanski - Head of Solution Architecture, AWS Resources: AWS DataSync announces a 68% price reduction https://aws.amazon.com/about-aws/whats-new/2019/11/aws-datasync-announces-68-percent-price-reduction/ Amazon Elastic Container Service publishes multiple GitHub Actions https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-elastic-container-service-publishes-multiple-github-actions/ AWS for GitHub Actions https://github.com/aws-actions Amazon EKS Increases Limits to 100 Clusters per Region https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-eks-increases-limits-to-100-clusters-per-region/ Amazon EKS enables network access restrictions to Kubernetes cluster public endpoints https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-eks-enables-network-access-restrictions-to-kubernetes-cluster-public-endpoints/ DNS Resolution for EKS Clusters Using Private Endpoints https://aws.amazon.com/about-aws/whats-new/2019/12/dns-resolution-for-eks-clusters-using-private-endpoints/ AWS launches Fargate Spot, save up to 70% for fault tolerant applications https://aws.amazon.com/about-aws/whats-new/2019/12/aws-launches-fargate-spot-save-up-to-70-for-fault-tolerant-applications/ Amazon EC2 Spot Now Provides Instance Launch Notifications via Amazon CloudWatch Events https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-ec2-spot-now-provides-instance-launch-notifications-via-amazon-cloudwatch-events/ AWS Whats New (Webhook): Amazon SES now enables you to configure DKIM using your own RSA key pair https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-ses-now-enables-you-to-configure-dkim-using-your-own-rsa-/
Notre invitée est Alice Charasse, directrice générale de la Société Québécoise de la Schizophrénie (SQS). Elle nous présente la SQS et nous parle de ses services. Au segment Espresso et à l'Espresso allongé, nous parlons de la détresse des hommes. En début de deuxième partie de l’émission, notre collaboratrice Ève Cardinal nous propose son sujet : harcèlement et santé mentale au travail. Animateur, intervieweur et producteur: Yvan Bujold Preneur de son et chroniqueur aux segments Espresso: Pierre Laporte Site Web de Folie Douce: antenne.qc.ca © Copyright Antenne Communications Tous droits réservés
Otras charlas de Codemotion 2019 también en podcast: https://lk.autentia.com/Codemotion-Podcast ¿Cómo escalas un producto que registra más de 5000 usuarios nuevos todos los días y más de 20 M de peticiones nuevas? ¿Y si te digo que nuestros usuarios vienen de más de 100 países? ¿Y si te cuento encima que no tenemos a ningún DevOps en el equipo? Mi nombre es Chema Roldán y soy CTO y co-fundador de Genially. Me gustaría compartir con vosotros cómo hemos conseguido escalar nuestra infraestructura de manera eficiente y de forma real. Hablaremos de cómo utilizamos los servicios de AWS como Route53, Elastic Beanstalk, Lambda, API Gateway, SQS y Elastic Container Service. ¡Nos vamos a divertir!
It is a MASSIVE episode of updates that Simon and Nikki do their best to cover! There is also an EXTRA SPECIAL bonus just for AWS Podcast listeners! Special Discount for Intersect Tickets: https://int.aws/podcast use discount code 'podcast' - note that tickets are limited! Chapters: 02:19 Infrastructure 03:07 Storage 05:34 Compute 13:47 Network 14:54 Databases 17:45 Migration 18:36 Developer Tools 21:39 Analytics 29:25 IoT 33:24 End User Computing 34:08 Machine Learning 40:21 AR and VR 41:11 Application Integration 43:57 Management and Governance 48:04 Customer Engagement 49:13 Media 50:17 Mobile 50:36 Security 51:26 Gaming 51:39 Robotics 52:13 Training Shownotes: Special Discount for Intersect Tickets: https://int.aws/podcast use discount code 'podcast' - note that tickets are limited! Topic || Infrastructure Announcing the new AWS Middle East (Bahrain) Region | https://aws.amazon.com/about-aws/whats-new/2019/07/announcing-the-new-aws-middle-east--bahrain--region-/ Topic || Storage EBS default volume type updated to GP2 | https://aws.amazon.com/about-aws/whats-new/2019/07/ebs-default-volume-type-updated-to-gp2/ AWS Backup will Automatically Copy Tags from Resource to Recovery Point | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-backup-will-automatically-copy-tags-from-resource-to-recovery-point/ Configuration update for Amazon EFS encryption of data in transit | https://aws.amazon.com/about-aws/whats-new/2019/07/configuration-update-for-amazon-efs-encryption-data-in-transit/ AWS Snowball and Snowball Edge available in Seoul – Amazon Web Services | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-snowball-and-aws-snowball-edge-available-in-asia-pacific-seoul-region/ Amazon S3 adds support for percentiles on Amazon CloudWatch Metrics | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-s3-adds-support-for-percentiles-on-amazon-cloudwatch-metrics/ Amazon FSx Now Supports Windows Shadow Copies for Restoring Files to Previous Versions | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-fsx-now-supports-windows-shadow-copies-for-restoring-files-to-previous-versions/ Amazon CloudFront Announces Support for Resource-Level and Tag-Based Permissions | https://aws.amazon.com/about-aws/whats-new/2019/08/cloudfront-resource-level-tag-based-permission/ Topic || Compute Amazon EC2 AMD Instances are Now Available in additional regions | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ec2-amd-instances-available-in-additional-regions/ Amazon EC2 P3 Instances Featuring NVIDIA Volta V100 GPUs now Support NVIDIA Quadro Virtual Workstation | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ec2-p3-nstances-featuring-nvidia-volta-v100-gpus-now-support-nvidia-quadro-virtual-workstation/ Introducing Amazon EC2 I3en and C5n Bare Metal Instances | https://aws.amazon.com/about-aws/whats-new/2019/08/introducing-amazon-ec2-i3en-and-c5n-bare-metal-instances/ Amazon EC2 C5 New Instance Sizes are Now Available in Additional Regions | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-ec2-c5-new-instance-sizes-are-now-available-in-additional-regions/ Amazon EC2 Spot Now Available for Red Hat Enterprise Linux (RHEL) | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ec2-spot-now-available-red-hat-enterprise-linux-rhel/ Amazon EC2 Now Supports Tagging Launch Templates on Creation | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ec2-now-supports-tagging-launch-templates-on-creation/ Amazon EC2 On-Demand Capacity Reservations Can Now Be Shared Across Multiple AWS Accounts | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ec2-on-demand-capacity-reservations-shared-across-multiple-aws-accounts/ Amazon EC2 Fleet Now Lets You Modify On-Demand Target Capacity | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-ec2-fleet-modify-on-demand-target-capacity/ Amazon EC2 Fleet Now Lets You Set A Maximum Price For A Fleet Of Instances | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-ec2-fleet-now-lets-you-submit-maximum-price-for-fleet-of-instances/ Amazon EC2 Hibernation Now Available on Ubuntu 18.04 LTS | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ec2-hibernation-now-available-ubuntu-1804-lts/ Amazon ECS services now support multiple load balancer target groups | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ecs-services-now-support-multiple-load-balancer-target-groups/ Amazon ECS Console now enables simplified AWS App Mesh integration | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ecs-console-enables-simplified-aws-app-mesh-integration/ Amazon ECR now supports increased repository and image limits | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ecr-now-supports-increased-repository-and-image-limits/ Amazon ECR Now Supports Immutable Image Tags | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ecr-now-supports-immutable-image-tags/ Amazon Linux 2 Extras now provides AWS-optimized versions of new Linux Kernels | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-linux-2-extras-provides-aws-optimized-versions-of-new-linux-kernels/ Lambda@Edge Adds Support for Python 3.7 | https://aws.amazon.com/about-aws/whats-new/2019/08/lambdaedge-adds-support-for-python-37/ AWS Batch Now Supports the Elastic Fabric Adapter | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-batch-now-supports-elastic-fabric-adapter/ Topic || Network Elastic Fabric Adapter is officially integrated into Libfabric Library | https://aws.amazon.com/about-aws/whats-new/2019/07/elastic-fabric-adapter-officially-integrated-into-libfabric-library/ Now Launch AWS Glue, Amazon EMR, and AWS Aurora Serverless Clusters in Shared VPCs | https://aws.amazon.com/about-aws/whats-new/2019/08/now-launch-aws-glue-amazon-emr-and-aws-aurora-serverless-clusters-in-shared-vpcs/ AWS DataSync now supports Amazon VPC endpoints | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-datasync-now-supports-amazon-vpc-endpoints/ AWS Direct Connect Now Supports Resource Based Authorization, Tag Based Authorization, and Tag on Resource Creation | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-direct-connect-now-supports-resource-based-authorization-tag-based-authorization-tag-on-resource-creation/ Topic || Databases Amazon Aurora Multi-Master is Now Generally Available | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-aurora-multimaster-now-generally-available/ Amazon DocumentDB (with MongoDB compatibility) Adds Aggregation Pipeline and Diagnostics Capabilities | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-documentdb-with-mongodb-compatibility-adds-aggregation-pipeline-and-diagnostics-capabilities/ Amazon DynamoDB now helps you monitor as you approach your account limits | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-dynamodb-now-helps-you-monitor-as-you-approach-your-account-limits/ Amazon RDS for Oracle now supports new instance sizes | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-rds-for-oracle-now-supports-new-instance-sizes/ Amazon RDS for Oracle Supports Oracle Management Agent (OMA) version 13.3 for Oracle Enterprise Manager Cloud Control 13c | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-rds-for-oracle-supports-oracle-management-agent-oma-version133-for-oracle-enterprise-manager-cloud-control13c/ Amazon RDS for Oracle now supports July 2019 Oracle Patch Set Updates (PSU) and Release Updates (RU) | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-rds-for-oracle-supports-july-2019-oracle-patch-set-and-release-updates/ Amazon RDS SQL Server now supports changing the server-level collation | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-rds-sql-server-supports-changing-server-level-collation/ PostgreSQL 12 Beta 2 Now Available in Amazon RDS Database Preview Environment | https://aws.amazon.com/about-aws/whats-new/2019/08/postgresql-beta-2-now-available-in-amazon-rds-database-preview-environment/ Amazon Aurora with PostgreSQL Compatibility Supports Publishing PostgreSQL Log Files to Amazon CloudWatch Logs | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-aurora-with-postgresql-compatibility-support-logs-to-cloudwatch/ Amazon Redshift Launches Concurrency Scaling in Five additional AWS Regions, and Enhances Console Performance Graphs in all supported AWS Regions | https://aws.amazon.com/about-aws/ whats-new/2019/08/amazon-redshift-launches-concurrency-scaling-five-additional-regions-enhances-console-performance-graphs/ Amazon Redshift now supports column level access control with AWS Lake Formation | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-redshift-spectrum-now-supports-column-level-access-control-with-aws-lake-formation/ Topic || Migration AWS Migration Hub Now Supports Import of On-Premises Server and Application Data From RISC Networks to Plan and Track Migration Progress | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-migration-hub-supports-import-of-on-premises-server-application-data-from-risc-networks-to-track-migration-progress/ Topic || Developer Tools AWS CodePipeline Achieves HIPAA Eligibility | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-codepipeline-achieves-hipaa-eligibility/ AWS CodePipeline Adds Pipeline Status to Pipeline Listing | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-codepipeline-adds-pipeline-status-to-pipeline-listing/ AWS Amplify Console adds support for automatically deploying branches that match a specific pattern | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-amplify-console-support-git-based-branch-pattern-detection/ Amplify Framework Adds Predictions Category | https://aws.amazon.com/about-aws/whats-new/2019/07/amplify-framework-adds-predictions-category/ Amplify Framework adds local mocking and testing for GraphQL APIs, Storage, Functions, and Hosting | https://aws.amazon.com/about-aws/whats-new/2019/08/amplify-framework-adds-local-mocking-and-testing-for-graphql-apis-storage-functions-hostings/ Topic || Analytics AWS Lake Formation is now generally available | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-lake-formation-is-now-generally-available/ Announcing PartiQL: One query language for all your data | https://aws.amazon.com/blogs/opensource/announcing-partiql-one-query-language-for-all-your-data/ AWS Glue now supports the ability to run ETL jobs on Apache Spark 2.4.3 (with Python 3) | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-glue-now-supports-ability-to-run-etl-jobs-apache-spark-243-with-python-3/ AWS Glue now supports additional configuration options for memory-intensive jobs submitted through development endpoints | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-glue-now-supports-additional-configuration-options-for-memory-intensive-jobs-submitted-through-deployment-endpoints/ AWS Glue now provides the ability to bookmark Parquet and ORC files using Glue ETL jobs | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-glue-now-provides-ability-to-bookmark-parquet-and-orc-files-using-glue-etl-jobs/ AWS Glue now provides FindMatches ML transform to deduplicate and find matching records in your dataset | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-glue-provides-findmatches-ml-transform-to-deduplicate/ Amazon QuickSight adds support for custom colors, embedding for all user types and new regions! | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-quicksight-adds-support-for-custom-colors-embedding-for-all-user-types-and-new-regions/ Achieve 3x better Spark performance with EMR 5.25.0 | https://aws.amazon.com/about-aws/whats-new/2019/08/achieve-3x-better-spark-performance-with-emr-5250/ Amazon EMR now supports native EBS encryption | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon_emr_now_supports_native_ebs_encryption/ Amazon Athena adds Support for AWS Lake Formation Enabling Fine-Grained Access Control on Databases, Tables, and Columns | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-athena-adds-support-for-aws-lake-formation-enabling-fine-grained-access-control-on-databases-tables-columns/ Amazon EMR Integration With AWS Lake Formation Is Now In Beta, Supporting Database, Table, and Column-level access controls for Apache Spark | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-emr-integration-with-aws-lake-formation-now-in-beta-supporting-database-table-column-level-access-controls/ Topic || IoT AWS IoT Device Defender Expands Globally | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-iot-device-defender-expands-globally/ AWS IoT Device Defender Supports Mitigation Actions for Audit Results | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-iot-device-defender-supports-mitigation-actions-for-audit-results/ AWS IoT Device Tester v1.3.0 is Now Available for Amazon FreeRTOS 201906.00 Major | https://aws.amazon.com/about-aws/whats-new/2019/07/aws_iot_device_tester_v130_for_amazon_freertos_201906_00_major/ AWS IoT Events actions now support AWS Lambda, SQS, Kinesis Firehose, and IoT Events as targets | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-iot-events-supports-invoking-actions-to-lambda-sqs-kinesis-firehose-iot-events/ AWS IoT Events now supports AWS CloudFormation | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-iot-events-now-supports-aws-cloudformation/ Topic || End User Computing AWS Client VPN now adds support for Split-tunnel | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-client-vpn-now-adds-support-for-split-tunnel/ Introducing AWS Chatbot (beta): ChatOps for AWS in Amazon Chime and Slack Chat Rooms | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-aws-chatbot-chatops-for-aws/ Amazon AppStream 2.0 Adds CLI Operations for Programmatic Image Creation | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-appstream-2-adds-cli-operations-for-programmatic-image-creation/ NICE DCV Releases Version 2019.0 with Multi-Monitor Support on Web Client | https://aws.amazon.com/about-aws/whats-new/2019/08/nice-dcv-releases-version-2019-0-with-multi-monitor-support-on-web-client/ New End User Computing Competency Solutions | https://aws.amazon.com/about-aws/whats-new/2019/08/end-user-computing-competency-solutions/ Amazon WorkDocs Migration Service | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon_workdocs_migration_service/ Topic || Machine Learning SageMaker Batch Transform now enables associating prediction results with input attributes | https://aws.amazon.com/about-aws/whats-new/2019/07/sagemaker-batch-transform-enable-associating-prediction-results-with-input-attributes/ Amazon SageMaker Ground Truth Adds Data Labeling Workflow for Named Entity Recognition | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-sagemaker-ground-truth-adds-data-labeling-workflow-for-named-entity-recognition/ Amazon SageMaker notebooks now available with pre-installed R kernel | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-sagemaker-notebooks-available-with-pre-installed-r-kernel/ New Model Tracking Capabilities for Amazon SageMaker Are Now Generally Available | https://aws.amazon.com/about-aws/whats-new/2019/08/new-model-tracking-capabilities-for-amazon-sagemaker-now-generally-available/ Amazon Comprehend Custom Entities now supports multiple entity types | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-comprehend-custom-entities-supports-multiple-entity-types/ Introducing Predictive Maintenance Using Machine Learning | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-predictive-maintenance-using-machine-learning/ Amazon Transcribe Streaming Now Supports WebSocket | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-transcribe-streaming-now-supports-websocket/ Amazon Polly Launches Neural Text-to-Speech and Newscaster Voices | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-polly-launches-neural-text-to-speech-and-newscaster-voices/ Manage a Lex session using APIs on the client | https://aws.amazon.com/about-aws/whats-new/2019/08/manage-a-lex-session-using-apis-on-the-client/ Amazon Rekognition now detects violence, weapons, and self-injury in images and videos; improves accuracy for nudity detection | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-rekognition-now-detects-violence-weapons-and-self-injury-in-images-and-videos-improves-accuracy-for-nudity-detection/ Topic || AR and VR Amazon Sumerian Now Supports Physically-Based Rendering (PBR) | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-sumerian-now-supports-physically-based-rendering-pbr/ Topic || Application Integration Amazon SNS Message Filtering Adds Support for Attribute Key Matching | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-sns-message-filtering-adds-support-for-attribute-key-matching/ Amazon SNS Adds Support for AWS X-Ray | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-sns-adds-support-for-aws-x-ray/ Temporary Queue Client Now Available for Amazon SQS | https://aws.amazon.com/about-aws/whats-new/2019/07/temporary-queue-client-now-available-for-amazon-sqs/ Amazon MQ Adds Support for AWS Key Management Service (AWS KMS), Improving Encryption Capabilities | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-mq-adds-support-for-aws-key-management-service-improving-encryption-capabilities/ Amazon MSK adds support for Apache Kafka version 2.2.1 and expands availability to EU (Stockholm), Asia Pacific (Mumbai), and Asia Pacific (Seoul) | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-msk-adds-support-apache-kafka-version-221-expands-availability-stockholm-mumbai-seoul/ Amazon API Gateway supports secured connectivity between REST APIs & Amazon Virtual Private Clouds in additional regions | https://aws.amazon.com/about-aws/whats-new/2019/08/amazon-api-gateway-supports-secured-connectivity-between-reset-apis-and-amazon-virtual-private-clouds-in-additional-regions/ Topic || Management and Governance AWS Cost Explorer now Supports Usage-Based Forecasts | https://aws.amazon.com/about-aws/whats-new/2019/07/usage-based-forecasting-in-aws-cost-explorer/ Introducing Amazon EC2 Resource Optimization Recommendations | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-amazon-ec2-resource-optimization-recommendations/ AWS Budgets Announces AWS Chatbot Integration | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-budgets-announces-aws-chatbot-integration/ Discovering Documents Made Easy in AWS Systems Manager Automation | https://aws.amazon.com/about-aws/whats-new/2019/07/discovering-documents-made-easy-in-aws-systems-manager-automation/ AWS Systems Manager Distributor makes it easier to create distributable software packages | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-systems-manager-distributor-makes-it-easier-to-create-distributable-software-packages/ Now use AWS Systems Manager Maintenance Windows to select resource groups as targets | https://aws.amazon.com/about-aws/whats-new/2019/07/now-use-aws-systems-manager-maintenance-windows-to-select-resource-groups-as-targets/ Use AWS Systems Manager to resolve operational issues with your .NET and Microsoft SQL Server Applications | https://aws.amazon.com/about-aws/whats-new/2019/08/use-aws-systems-manager-to-resolve-operational-issues-with-your-net-and-microsoft-sql-server-applications/ CloudWatch Logs Insights adds cross log group querying | https://aws.amazon.com/about-aws/whats-new/2019/07/cloudwatch-logs-insights-adds-cross-log-group-querying/ AWS CloudFormation now supports higher StackSets limits | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-cloudformation-now-supports-higher-stacksets-limits/ Topic || Customer Engagement Introducing AI-Driven Social Media Dashboard | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-ai-driven-social-media-dashboard/ New Amazon Connect integration for ChoiceView from Radish Systems on AWS | https://aws.amazon.com/about-aws/whats-new/2019/07/new-amazon-connect-integration-for-choiceview-from-radish-systems-on-aws/ Amazon Pinpoint Adds Campaign and Application Metrics APIs | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-pinpoint-adds-campaign-and-application-metrics-apis/ Topic || Media AWS Elemental Appliances and Software Now Available in the AWS Management Console | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-elemental-appliances-and-software-now-available-in-aws-management-console/ AWS Elemental MediaConvert Expands Audio Support and Improves Performance | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-elemental-mediaconvert-expands-audio-support-and-improves-performance/ AWS Elemental MediaConvert Adds Ability to Prioritize Transcoding Jobs | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-elemental-mediaconvert-adds-ability-to-prioritize-transcoding-jobs/ AWS Elemental MediaConvert Simplifies Editing and Sharing of Settings | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-elemental-mediaconvert-simplifies-editing-and-sharing-of-settings/ AWS Elemental MediaStore Now Supports Resource Tagging | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-elemental-mediastore-now-supports-resource-tagging/ AWS Elemental MediaLive Enhances Support for File-Based Inputs for Live Channels | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-elemental-medialive-enhances-support-for-file-based-inputs-for-live-channels/ Topic || Mobile AWS Device Farm improves device start up time to enable instant access to devices | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-device-farm-improves-device-start-up-time-to-enable-instant-access-to-devices/ Topic || Security Introducing the Amazon Corretto Crypto Provider (ACCP) for Improved Cryptography Performance | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-the-amazon-corretto-crypto-provider/ AWS Secrets Manager now supports VPC endpoint policies | https://aws.amazon.com/about-aws/whats-new/2019/07/AWS-Secrets-Manager-now-supports-VPC-endpoint-policies/ Topic || Gaming Lumberyard Beta 1.20 Now Available | https://aws.amazon.com/about-aws/whats-new/2019/07/lumberyard-beta-120-now-available/ Topic || Robotics AWS RoboMaker now supports offline logs and metrics for the AWS RoboMaker CloudWatch cloud extension | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-robomaker-now-supports-offline-logs-metrics-aws-robomaker-cloudwatch-cloud-extension/ Topic || Training New AWS Certification Exam Vouchers Make Certifying Groups Easier | https://aws.amazon.com/about-aws/whats-new/2019/07/new-aws-certification-exam-vouchers-make-certifying-groups-easier/ Announcing New Resources and Website to Accelerate Your Cloud Adoption | https://aws.amazon.com/about-aws/whats-new/2019/07/announcing-new-resources-and-website-to-accelerate-your-cloud-adoption/ AWS Developer Series Relaunched on edX | https://aws.amazon.com/about-aws/whats-new/2019/08/aws-developer-series-relaunched-on-edx/
In this session we introduce AWS Cloud Map, a new service that lets you build the map of your cloud. It allows you to define friendly names for any resource such as S3 buckets, DynamoDB tables, SQS queues, or custom cloud services built on EC2, ECS, EKS, or Lambda. Your applications can then discover resource location, credentials, and metadata by friendly name using the AWS SDK and authenticated API queries. You can further filter resources discovered by custom attributes, such as deployment stage or version. AWS Cloud Map is a highly available service with rapid configuration change propagation.
On this week’s podcast, Wes Reisz talks with Ben Kehoe of iRobot. Ben is a Cloud Robotics Research Scientist where he works on using the Internet to allow robots to do more and better things. AWS and, in particular, Lambda is a core part of cloud enabled robots. The two discuss iRobot’s cloud architecture. Some of the key lessons on the podcast include: thoughts on logging, deploying, unit/integration testing, service discovery, minimizing costs of service to service calls, and Conway’s Law. Why listen to this podcast: - The AWS Platform, including services such as Kinesis, Lambda, and IoT Gateway were key components in allowing iRobot to build out everything they needed for Internet-connected robots in 2015. - Cloud-enabled Roombas talk to the cloud via the IoT Gateway (which is MQQT) and are able to perform large file uploads using mutually authenticated certificates signed via an iRobot Certificate Authority. The entire system is event-driven with lambda being used to perform actions based on the events that occur. - When you’re using serverless, you are using managed infrastructure rather than building your own. So that means, when they exist, you have to accept the limitations of the infrastructure. For example, until recently Lambda didn’t have an SQS integration. So because of that limitation, you have to have inventive ways to make things work as you want. - Serverless is all about the total cost of ownership. It’s not just about development time, but across on areas that need to support operating the environment. - iRobot takes an approach of unit testing functions locally but does integration testing on a deployed set of functions. A library called Placebo helps engineers record events sent to the cloud and then replay them for local unit tests. - For logging/tracing, iRobot packages up information that a function uses into a structured record that is sent to CloudWatch. They then pipe that into SumoLogic to be able to trace executions. Most of the difficulties that happen tend to happen closer to the edge. - iRobot uses Red/Black deployments to have a completely separate stack when deploying. In addition, they flatten (or inline) their function calls on deployment. Both of these techniques are used as cost optimization techniques to prevent lambdas calling lambdas when not needed. - Looking towards the future of serverless, there is still work to be done to offer the same feature set that more traditional applications can use with service meshes. More on this: Quick scan our curated show notes on InfoQ https://bit.ly/2NYsXNN You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/2NYsXNN
We sit down with facilitator, instructional designer, meeting host and leadership consultant David Foster of Capgemini to talk about the importance of social and emotional intelligence.David's LinkedInTRANSCRIPTAde: “EQ is our ability to manage ourselves and our emotions. In the workplace, this means acting and reacting to events appropriately, such as maintaining your composer and ability to perform under pressure. However, as important as EQ is, it is also necessary but not sufficient for success. Confidence in navigating the workplace culture, high SQ, is the major obstacle for women and minorities. Culture is largely shaped by the dominant group, which for most workplaces is straight white men. This is not a conspiracy or a plot. We all tend to befriend people who are similar to us or with whom we have the most common. We take work breaks with our buddy. We grab a quick lunch with our friend. Women do this. Minorities do this. Straight white men do this. For the latter group however, this often results in power begetting power. Women and minorities in particular need to have high SQs. They need to be perceptive, vigilant, and deliberate in how they navigate the workplace culture. Not being automatically part of the workplace power club is a given for women and minorities. We can bemoan that fact, or we can take action. Taking offense or feeling hurt keeps us stuck. Successfully navigating the workplace culture--demonstrating high SQ--is the key to career growth and success.” The excerpt I just read from Smart Is Not Enough: Why Social Intelligence (SQ) may be the key to career success for women and minorities by Phyllis Levinson challenges what being good enough looks like in the working world, and social and emotional intelligences are the secret sauces to climbing the corporate ladder. How do people groups with lesser social capital and access thrive in these highly competitive spaces? This is Ade, and you're listening to Living Corporate. So today we're talking about social and emotional intelligence.Zach: Yeah. So I know you gave the definitions in your intro, but when I think of definitions for these terms, I think of it as emotional intelligence being your ability to understand and manage yourself where as social intelligence is your ability to understand and manage the relationships around you. Ade: That's about right. And I think it's interesting because I would argue that by the nature of black and brown folks being the minority, minorities in the workplace have some of the highest emotional intelligence, right? I mean, I know I'm always thinking about how I'm going to come across, how to speak, how to phrase my questions both in email and in person, and, you know, not live up or down to some stereotypes and come across as angry. And I'd say that's pretty common. I think that code-switching speaks to this phenomenon the most. The fact that we change our voices with the hopes of being accepted and making others feel more comfortable with us speaks to a certain level of emotional intelligence, no?Zach: No, I absolutely agree. And look, I don't think we're saying that minorities don't need help in better developing and honing their emotional intelligence, but it is me saying that you don't often see minorities in the corporate workplace with emotional, like, outbursts. In your experience, how many times have you seen someone that was not white just completely lose control at work, Ade?Ade: Never, and I definitely get your point. Your point is well-taken, but to me the social intelligence part is a huge hurdle. So the article you referenced earlier is interesting because I posit that if power resides with the majority group and people of color don't heavily engage with the majority--like you were saying, people tend to associate with people who are most like them--how do we learn how to navigate those spaces?Zach: It kind of--it actually kind of throws the whole idea or the term of social intelligence into question, right? Because it's not particularly an issue of mental capacity or capability as it is access. Like, I don't know how to manage this particular relationship in the workplace, not because I'm inept but because I don't have access to these relationships in the same ways as folks who don't look like me are. I mean, am I--am I tripping? Am I onto something?Ade: I do think you're onto something. It reminds me of our very first episode with Fenorris when he was talking about the white executive giving him the real talk in that plane, which by the way, side note, I know y'all have been rocking with us for a while, but if you haven't listened to our very first episode with Fenorris Pearson you definitely should go give it a listen. Back to reality. Fenorris was saying that it is essentially obvious when his black colleagues were trying to mimic behavior and mimic a culture that isn't necessarily theirs, and it built more distrust than not ironically. You might also remember this conversation about authenticity in our episode with Janet Pope essentially saying that people who find themselves in the minority, particularly folks of color, often put on personas that we believe mirrors that of the majority when in actuality the people around us who we're trying to mirror don't recognize themselves and they recognize that lack of authenticity.Zach: Right, and that's not really our fault. Like I said before, we don't have access because historically we haven't been allowed access. We're just now really engaging in these spaces [inaudible]. It's only been what, like, 50 years since the last civil rights bill was passed? So it's been, like, a pretty short line. The point is because of the way that Corporate America is set up, we have to have skills that extend beyond the X's and O's. It's not just critical for our growth, it's really needed for our corporate survival.Ade: Right. And you know, it would be great if we could at some point, I mean, over the course of this season, be able to speak to someone who is a bit of a subject matter expert on social and emotional intelligence. Maybe someone with outstanding communication, conflict resolution and interpersonal skills, and I would feel really comfortable, even more comfortable, maybe if they had maybe 20 years of experience as an instructional designer, a corporate facilitator and [inaudible]. And just to put some nice little icing on top, if they were actually responsible for the coaching and professional development of executives for an international consulting firm, I might just faint.Zach: Oh, you mean like our guest David Foster?Zach and Ade: Whaaaaaat?Zach: *imitating air horns* Sound Man, you know what it is. Put 'em right there. Let's go. Ade: That's never gonna fail to make me laugh. All right, so next up we're gonna get into our interview with our guest, Mr. David Foster. Hope y'all enjoy.Zach: And we're back. And as we said, we have David Foster on the show. David, welcome to the show, man. How are you doing?David: Hey. I'm doing great, Zach. Thanks for inviting me. A real pleasure.Zach: Absolutely, man. So look, as you know, today we're talking about the importance of emotional intelligence in the workplace. Can you talk to us about what emotional intelligence is and how it comes into play with how you do your job?David: Yeah. So a couple things, you know? I work as a facilitator in Capgemini's Accelerated Solutions Environment. You know, despite the fact that we're a technology company we're really in the people business, and, you know, what we specialize in in the ASE is helping people getting aligned really quickly, helping them making decisions, and helping them come up with really innovative solutions to really wicked, challenging problems, and that's not something that you can do without having a high degree of emotional intelligence. You know, as a facilitator I'm typically at the front of the room, and for me it's not really about presenting myself as an expert as much as it is shepherding people through our process. So emotional intelligence for me is something that I have to pay real close attention to. You know, when I think about it, there are a couple of pieces to emotional intelligence. You've got the idea of just perceiving emotions, and so for me, you know, when I'm in front of an audience or a client group, it's about trying to understand where they are emotionally. And a lot of times we're dealing with really charged topics, so understanding what position they are on that rollercoaster is really important, you know? And that's the other part of it is, like, understanding emotions. So you can perceive them and you can feel them, but you have to be able to interpret them a little bit, a lot of bit, you know? That helps you decide what questions you need to ask or helps you decide how you might shift the focus of a session or how you might even capitalize on the emotions that you're perceiving. You know, for me and my position, it's about managing that emotion sometimes, and I'm speaking not only about the client and about the audience, but I'm speaking about myself as a facilitator. Look, we're all human. You know that, Zach. Right? Like, we're all human beings, and when you're standing up in front of a group or even if it's one-on-one, the emotion that comes off of someone or someones, you feel that, right? And so sometimes it's about not only managing the emotion that's coming from folks--maybe it's questioning, you know, the origin of it or where it's coming from, but it's also understanding what it's doing to you, you know? Because it can certainly either trigger your emotions--it might put you in a position where you end up feeling some emotions, you know, based on empathy with a group, but managing those emotions is key. And then it's really about using emotions. So if I think about those four things, like perceiving, understanding, managing and then using--and when I say using, it's not--you're not trying to take advantage of folks in terms of using emotion, but you're looking at and perceiving those emotions, understanding them and trying to figure out, "Okay, how best can we tap into this to help us achieve our goals?" So if there's energy and intent to do something, you know, how do we make sure that we put people in the position so that they can do that? Emotional intelligence is essential, you know? And it's not just in my role. I think it's in every role in our corporate environment, you know? Because like I said, we're a people business, and people have emotions, you know? We are emotional, sentient beings, and so if you think that just your IQ is enough, I think you're sadly mistaken. So that's--in a nutshell, I think, you know, the synopsis of how I think about EQ and how I think about emotional intelligence and it impacts me when it comes to how I do my job as a facilitator. Now, I can extend that even further, you know? There are lots of touch points where I'm not only interacting with colleagues or I'm interacting with clients in different ways, you know? And emotional intelligence extends beyond just when you're in front of the room. It has to do with your interpersonal relationships in terms of how you work with others, you know, how you contribute to a team and how you ultimately can add value to an organization, so.Zach: See, that's so intriguing. So have you had any situations--rather, have you had any situations where you've seen business relationships completely be broken by a lack of emotional intelligence? And if so, would you mind sharing a story?David: Yeah. You know what? Broken is, like, the end, but I think there's a continuum. If you're not keen on or at least focused on emotional intelligence, you can fracture relationships, you can damage relationships. So there's a whole lot that you can do outside of just breaking them. I just did a session this weekend that's really interesting. The guy that was one of the main sponsors of our session, the CIO, you know, he's taken the DiSC profile, and I have my own opinions about assessments. I think they're all information, you know? I don't know if that truly defines who you are and how you are as much as it just gives you information to help you decide how you might proceed in terms of your relationships or in terms of your preferences. And this guy, you know, he had taken the DiSC profile, and so he characterizes himself as a driver, you know? "I'm just a high D. I'm a high D." And it's almost like he uses that as his lead into any sort of conversation, you know? Not to mention that he's also a lawyer by trade, you know? And he's got a penchant for, you know, winning arguments no matter the cost, and he has a penchant for arguing and driving people very, very hard no matter the cost. So here we are in this ASE session, and, you know, the way we work is we have large-group stuff and then we get into breakouts, and I always talk to my sponsors about, you know, when you get into these breakouts you want your people to do the work, and you want to almost sit back, and you want to ask more questions than give more answers, and you don't want to stand up and pontificate. Well, he took this opportunity--they were sharing some information about a particular work stream, and he took this opportunity in front of, you know, a small group of folks to run up one side of this person and down the other, basically asking a lot of pointed questions, creating an argument, trying to win an argument about why certain work hadn't been done, right? And what I saw happen was not only did that change the tone and the tenor of the breakout, but it also changed the tone and tenor of their relationship for the rest of the session, where this person who had been on the receiving end of these very pointed and very argumentative sort of interjections, you know, almost shut down, right? And you don't want to do that, and I think about that, specifically in the session seeing that, but I was wondering, "Man, what is it like every day to work with this person if that's what you have to deal with?" And I actually pulled her aside to check on her and said, you know, "Are you doing okay?" And she said, "That's my everyday." And so when you think about that--you know, here you have this leader who is, you know, putting out front the idea that because "I'm a D, because I'm a high driver, I almost don't have to pay attention to how or what I do and how or what I say impacts the folks that I'm saying it to," because he can hold that shield up in front. And like I said, those assessments and those types of things are really only information, and the fact that he took that opportunity to basically confront this person, you know, not really understanding--well, it's not even not that--he understood what we were doing, but not being sensitive enough or being aware enough to know, you know, what those actions could possibly do to that person within our session. You know, that indicated a pretty severe lack of emotional intelligence. Now, whether or not he's able to repair that relationship I think is up to him. You know, Zach, I've got--and we've talked before about leadership, and we've talked before about, you know, how to lead and different styles of leadership, and I think EQ is, like, a really important arrow in the quiver. It's just one thing, you know? And having a high degree of emotional intelligence allows you to not only be self-aware, but it also allows you to be flexible, right? If you're--if you're focused not only on the things that are triggers for you, your own emotions, you know, that's part of it. You have to pay attention to the other emotions, and you almost have to--you have to be flexible, and you have to be able to adapt your approach, and you have to be able to adapt how you communicate based on the emotions of the other folk in the room, you know? Not just yours, but others, and it was obviously--it was a pretty charged conversation. He had some things he wanted to get out, but there's a way of communicating that so that you don't, like you said, break or damage your relationship. And just to extend the story further, you know, I had a confrontation with him. He wanted to--we have this thing in the ASE called proposals where, you know, people put proposals in front of a group of judges to--you know, what does the way forward look like? Take your best shot, right? So we have--we have the judges, and, you know, he wanted to be a judge, and I told him--I said, "I don't know if that's a good idea." I said, "Based on your closeness to the problem, based on your position in the organization, and based on what I observed," you know, based on how his interactions could change the tone and tenor of conversations, I advised him against it. And he didn't push too hard on that, and he said, "Well, how do the judges work?" I said, "Well, they develop criteria," and he said, "I want to be part of that conversation." And I stopped him and I said, you know, "What's your interest?" Right? And he said, "I want to make sure that my opinions are represented," and I proceeded to lay it out for him. I said, "Look, you know, ASE sessions are a chance for you to let the people in the room own the work, and it's a great chance for leaders to watch their people work. You know, you've got some smart folks here, you know? And you almost have to trust that they're gonna come up with the right criteria," et cetera, et cetera, and Zach, we went back and forth.Zach: Really?David: And talk about emotional intelligence. You know, at that point I have to know what my triggers are, right? So I could've gotten into this back-and-forth argument, but I have to remember my role. My role is a facilitator, right? I can't really hold a position. And I told him that. I said, "I'm not gonna hold a position. As a matter of fact, I'm not gonna argue with you." I said, "I've laid out the risks. I've told you what could happen if you involve yourself in this conversation. Ultimately it's up to you to make the choice, and I'm not gonna stand in your way, but you can't come back to me and look at me and say, "That didn't go the way I thought it would," because I cautioned you and I warned you," and I said, "I'm basically done arguing with you because it's obvious that you want to win this argument. So, you know, if you want to be part of this criteria development, have at it." And so we walked away from each other. Relationship wasn't broken. You know, still respected me as a facilitator, and as we're getting back into the main space--'cause we were pulling people together to get them ready to do this assignment--he stops me and he says, "You know, I've changed my mind. I'm not gonna be part of it." I said, "Okay," and so I proceeded to set up the assignment, send people out, and then I found him and I said, "Would you mind telling me what changed your mind?" And he said it was ego. He said, "That conversation between you and I was all about ego," and he said, "I have to be better about managing my emotions, and I have to be better about managing my ego, and sometimes I need to exercise a bit more humility." And he actually went back to the other conversation. He said, "You know, I had a situation where I went at somebody on my team pretty hard, and that wasn't a good thing. And I did the same thing to you, and that wasn't a good thing." So in that small little microcosm you had somebody who was on the one end, you know, really not aware. Like, self--maybe self-aware, you know, using the DiSC assessment as his form of awareness, but not aware of how he was behaving would impact others, right? Really not understanding the emotions that he was generating based on how he was interacting, and he actually--the pendulum actually swung for him, you know? So I don't know when it happened, how it happened. I don't know if I had anything to do with it. You know, maybe it was just the switch flipped, and he was--you know, all of a sudden he had the ability to say, "You know what? I really need to take a step back and look at how my behavior and how I'm managing my emotions and how I'm using my emotions is actually impacting others," you know? And I think that's an important point, and I'm sorry to just prattle on, but, you know, emotional intelligence is a skill. It's something that you can develop. It's something that you can learn, and a lot of times one of the ways we learn is by reflecting, self-reflection, on the situations that we've been presented with, how we've responded, how we've behaved, and how we might change or how we might do things differently.Zach: As you know, our show focuses on people of color in the workplace, like their experiences and perspectives and really having authentic discussions around that idea and around that identity. So I would posit minorities have more pressure to be self-aware by the nature of them just being minorities, by the nature of them being--David: [inaudible].Zach: Right? The smallest group in the space. There's pressure, or there's an expectation that we just need to be more self-aware. So what advice would you give to a people group who's already aware that they are the minority when it comes to growing and developing emotional intelligence?David: Yeah. You know what? We could--how much time do we have? Man, [laughs] because--so I think about that a lot, and maybe some historic context here. This idea that we, because we have been so excluded as people of color from institutions of--I mean, call it whatever. Learning. Institutions of earning. You know, social institutions. We've always been in positions where we've had to extend the olive branch, or if I think about the middle ground, we're always crossing that middle ground, do you know what I mean? Like, we're always expected to reach further and reach farther because these institutions have been established before us, and they weren't designed with us in mind, right? And it's--you know, if we want entry into them, you know, we're the ones that have to make the choices and decisions about how to interact with people. It's almost like we have to present ourselves in ways that make it okay for people to accept us, right? Which is an emotionally charged conversation, and again, we could spend, you know, four, five, eight podcasts. It's an ongoing conversation, right? So I don't disagree with you. I think we have to be, as people of color and as a minority group within, you have to be extremely self-aware, number one about your emotions, because there's a lot that could trigger you, you know? And understanding what your triggers are and understanding intent behind what people say or how they interact with you, being able to manage your emotions. It's a skill you have to have, you know? I would almost say forget about excelling, right? Forget about the idea of being promoted or moving up in an organization. I mean, talk just surviving, right? So think about being on projects. Think about being part of teams. How do you, as someone coming to this already in a position where, you know, people have perceptions of you whether or not we're welcome, whether or not we're able to perform at the same level. How do you manage that and then still do your job? I think emotional intelligence is something that you absolutely have to have. Without that, you know, this business will chew you up and basically spit you out. And it's not just EQ, Zach. You know, it's not just emotional intelligence. It's almost like you have to have some social awareness, you know what I mean? Like, you have to--you have to have a bit of empathy, a lot of empathy. You've got to really understand, you know, the organization, you know what I mean? You really have to know where you're working and who you're working for, and in that self-management, you know, how to be--how to control yourself in what can be emotionally charged situations. It's critical, you know? The only way that you're gonna succeed, you know, is if you have a strong sense of, you know, social EQ or social IQ and emotional intelligence. I read something--you know, this guy Daniel Goleman, which--I mean, his model of emotional intelligence is one that's been around for a really long time, you know? He said, "IQ is only 20% of it." Right? EQ is 80%, and I would--I'd offer that social IQ is key. So I don't know if I answered the question completely. You know, I'll get back to the advice. The advice I would--I would give to folks is, you know, you want to position yourself with mentors who have been successful navigating this organization, you know? They haven't moved up into leadership positions by accident. There's something that they're doing right, and whether it's, you know, that they have a highly evolved sense of self or they have a really highly evolved ability to perceive social and emotional situations, you know, you want to find mentors who can actually coach you on how to navigate some of these situations 'cause they're gonna repeat themselves, you know? And if you get good at handling them, you know, I think that is what positions you to do well in this organization. Now, that doesn't change the fact that there's some messed up stuff that goes on out there, right? I mean, let's just be real. You know, we have to deal, as people as color, as the minority group in an organization, there are some folks who, you know, quite frankly may not care whether we succeed or not, right? And that's just the reality, and part of what we deal with I think is, you know, our ability to understand who's in the room. You know, maybe the position that they're holding in terms of, you know, does this person care about me as person or not? Does it matter, right? And then what do I do with that, right? So that's my emotional intelligence, right? My ability to be reflective, you know? My ability to notice my emotional self within a work situation, you know? My ability to evaluate those situations and really begin to notice patterns, right? And then if you notice the patterns, you might start to see some opportunities for you to do something different.Zach: So you've given advice around what people of color and underrepresented groups in Corporate America can do to really develop or continue to sharpen their emotional intelligence and their social IQ. I'm curious, what advice would you give to the C-Suite regarding emotional intelligence and those who seek to be more ethnically inclusive and more welcoming so that they can actually acquire or procure the talent that they're looking for from these ethnically diverse spaces?David: Yeah. That's a multifaceted conversation, right? I think, you know, leaders that are looking to be more inclusive, first of all you have to have a high degree of EQ, right? Your sense of self needs to be very, very strong. You also have to--and within that sense of self, I think it's understanding your intent. Like, what's my intention? You know, is it checking a box? Do I really believe that involving and having a diverse workforce is gonna be advantageous, not only to the things that I touch but to the broader organization? You know, that sense of self is critical, and I would offer something else. It's not just emotional intelligence, it's not just social intelligence, but there's this thing. I don't know if you've heard of this, but the empathy quotient too. Like, your ability to put yourself in the shoes of others, right? Your ability to really walk a mile in the shoes of somebody else, you know? That whole idea of active listening and understanding the intent with which someone is communicating to you, you know? What's the message behind the words? I think--you know, I'm not part of the C-Suite, you know? And I think anything that I'm offering is really just what I've observed in terms of what's really been successful for people looking to be more inclusive. You know, you've got to be awesome at problem solving, and I think the combination of those three things--you know, the social intelligence, the emotional intelligence, your empathy quotient--helps you solve problems, you know? You've got to provide and be a supportive communicator. I think you have to be able to be flexible and be able to communicate with different types of folk. That's just the bottom line. You've got to be confident, you know, truly in empowering people, you know? A to B is always gonna be A to B, but the road may look completely different than you thought, and when you're involving diverse populations in a workforce, you know, you have to believe that the road to get from A to B may be something different just based on the types of people that you get involved, you know? And, I mean, I think in terms of attracting folks to work in a situation, you know, where we work, in this corporate environment, you know, you have to do your best to provide an opportunity and to provide and create an environment where people can contribute and add value, and the only way that you can do that I think is if you have a high degree of not only how you lead, right, but the environment that you want to create, and you have to model that behavior, right? You've got to make sure that no matter what it is, whether it's problem solving, whether it's managing conflict, whether it's how you empower others, whether it's how you communicate, whether it's how you motivate people, you know, I think as a leader, modeling that kind of behavior, that inclusive behavior, and modeling the fact that you need to have a high degree of emotional intelligence, a high degree of social intelligence, a high empathy quotient, you know, that's what makes people want to work with you, right? You know this, Zach. People don't leave jobs. They leave people, right? So the work that you can do on yourself, you know, to become more self-aware, it's gonna be reflected in your leadership style, right? The work that you do to become and increase your emotional intelligence, your empathy quotient, your social IQ, it's gonna be reflected in your leadership style, and people are gonna want to work with you, you know? They're gonna want to be part of an organization, you know, especially if you're modeling that behavior.Zach: Man. David, this has been a great conversation, man. Before we wrap up, do you have any parting words and/or any shout outs?David: Wow, shout outs? You know what? Here's the thing. I want to give a big shout out to the A3 posse at Capgemini. Doing incredible work, and a shout out and an apology, right, that I am not more involved. It's one of my goals this year to make myself, as part of the senior leadership of the organization, a bit more present, but I notice and I pay attention, and it's a potent group. Anybody out there who's listening who's not part of A3, you definitely want to get involved because they are doing great things to not only represent within this broader organization but it's a great resource, and it's just nice to be able to have conversations at times with people who speak the same language, who are going through the same things, you know, as we are as people of color trying to navigate, you know, this corporate environment. And I also want to thank you, Zach. I think Living Corporate is a step in the right direction, you know? The more that we can start talking about these things, the more that we can start to talk about the stuff that matters to us as people of color, especially in this day and age, without getting too political. You know, we recognize the times that we live in, and so it's extremely important that we hunker down and that we empower ourselves, right? With the tools that we need, with the kind of support that we need. You know, surround ourselves with the mentors that we need so that we can succeed, you know? And so that we can thrive, and ultimately so that we can definitely survive. So thank you, Zach. I can't--you're doing great work, brother. I want you to keep it up.Zach: Man, I appreciate it, David. And absolutely, man. Shout out for those who are listening. A Cubed is an African-American employee resource group at Capgemini, a great resource for black folks to come together and really, to David's point, really a strong point of relation and community within the community. So definitely shout out to A3, shout out to A Cubed. Shout out to Janet Pope, who was on the show before. I know that she leads that group. And David, man, thank you again for the love, man. We want to make sure to have you back, and we appreciate it, dude. We'll talk to you soon.David: All right. Zach, thank you very much.Zach: All right, man. Peace.David: Peace.Ade: And we're back. Zach, that was a great interview. I really appreciated his candid tone and vulnerability. I also really appreciated his stories around facilitating and managing personalities as well. I'm just out here trying to manage myself [inaudible].Zach: Right. In my experience in working with David, it's amazing to even just see it in action. I appreciated his points around being reflective and being able to interpret emotions and move accordingly.Ade: Well, he talked about emotional and social intelligence being what helps you solve problems. That really resonated with me because in my own head I get really, really nervous about dealing with people or being at work and having the right answer, and I've been noticing that when I take a breath and think through how I feel as well as those around me, beyond the X's and O's, the zeroes and ones, I'm able to arrive at a solution that actually works. To me, that's the simplest hook for the why behind why emotional and social intelligence might be a focus. They help you solve problems, and who doesn't want to be good at solving problems? With that being said, unless you have any further thoughts, let's get into our Favorite Things. How do you feel?Zach: No, that's awesome. Let's do it. So my favorite thing right now has to be DeRay Mckesson's book The Other Side of Freedom. I was really excited when he announced the fact that he was--he was almost finished with it, and so I preordered it, and I've been waiting, and it dropped on my birthday, September 4th. So I'm, like--I'm just excited to read it. I haven't really gotten fully into it yet, but I finished the intro, and I'm loving what I'm reading so far, and I can tell already that it's a favorite.Ade: So I'm confused. You said September 4th. Do you mean Beyonce's birthday? [Sound Man throws in car slamming on its brakes effect]Ade: Beyonce? Her birthday?Zach: I mean my birthday, and listen, I've been on this earth long enough now to realize that, yes, it's B Day. I get it, but, you know, it's my birthday too, okay? Beyonce does not own the day.[car slams on its brakes again]Ade: She does, because as you said, it's B Day, not Z Day. Which, you know, cool. You can have, like, September 5th or something, but September 4th is B Day. So, like, I guess you can rent September 4th. It's fine. It's fine. We'll be nice.Zach: [laughs] Okay. We might have to subtitle this show (B?) Happy Z Day. That would be kind of funny. We might do that.[again]Ade: Why not B Day?Zach: [sighs] Why don't we go ahead and go to your favorite things? How about that?Ade: All right. All right, okay. I'm gonna stop frustrating you. All right, so my current favorite thing is this book called The Storied Life of A.J. Fikry. Now, it is purely a work of fiction. It is comedy and it is drama and it is a tragedy, and if you're the sort of person who likes an emotional rollercoaster with your literary works I certainly recommend that book. My second favorite thing, because I can never choose just one, is this, like, nifty invention called a water bottle. I've been training for a marathon again, and I don't know how much you know about training for marathons, but they suck. The training sucks, the marathon sucks. I don't know why I'm doing this. Somebody help me. But water bottles have been saving my life so far, so there's that upside. Yay.Zach: [laughs] Okay. Well, yeah, definitely shout out to the book, and shout out to water bottles, you know? My wife, she just recently toured Route 66.Ade: Aye!Zach: Yeah, and one thing I remember I told her--I was like, "Listen, make sure you have water," and she said, "I will in my water bottle." So yes, shout out to water and shout out to Favorite Things, and as a reminder, to see all of our favorite things, go to our website, living-corporate.com, and click Faves. You'll see all of our favorite things for the season right there. Make sure you go check it out.Ade: Yep. And that's our show. Thank you for joining us on the Living Corporate podcast. Please make sure to follow us on Instagram at LivingCorporate, Twitter at LivingCorp_Pod, and subscribe to our newsletter through www.living-corporate.com. If you have a question you'd like us to answer and read on the show, please make sure you email us at livingcorporatepodcast@gmail.com. Also, don't forget to check out our Patreon at LivingCorporate as well. We're Living Corporate everywhere! That does it for us on this show. My name is Ade.Zach: And this has been Zach.Ade and Zach: Peace.Kiara: Living Corporate is a podcast by Living Corporate, LLC. Our logo was designed by David Dawkins. Our theme music was produced by Ken Brown. Additional music production by Antoine Franklin from Musical Elevation. Post-production is handled by Jeremy Jackson. Got a topic suggestion? Email us at livingcorporatepodcast@gmail.com. You can find us online on Twitter, Facebook, Instagram, and living-corporate.com. Thanks for listening. Stay tuned.
Today, we’re talking to Jeff Barr, vice president and chief evangelist at Amazon Web Services (AWS). He founded the AWS Blog in 2004 and has written more than 2,900 posts for it and another 1,100 for his personal blog. As chief evangelist, Jeff strives to explain the benefits of Cloud computing and Web services to anyone who will listen. Jeff is the voice of AWS. He does what he does best - exploits his superpower of explaining technology in ways that people can understand it. Jeff tries to be the same person all the time. He loves to meet people and go out of his way to say “Hello.” So, if you see him at re:Invent, say “Cheese” and take a selfie with him! Some of the highlights of the show include: Jeff uses AWS Workspaces for his blog; one of Jeff’s blogging principles is to not take anybody else's word for anything to the absolute best of his technical ability Zero Client: Jeff has no rotating hardware, disk drives, just a zero client; wherever he is, it's the same workspace AWS has something for everyone; it build things in response to customers’ questions, requests, and feedback Naming Services and Products: Is it helpful? Is it descriptive? Does it have any hidden meanings? Amazonian DNA and Dog Friendly Workspace: Jeff went from super fearful to accepting, to now thinking of dogs as incredible creations because they add fun and excitement to the office As part of hiring, each interviewer is assigned Amazon leadership principles (LPs) to ask questions that measure a candidate against those LPs What is the secret to getting hired at Amazon? Study the LPs to understand what they're about and be able to express your philosophies and history with LPs re:Invent makes sure customers understand services - What is it? What does it do? How do they put it to work? What are the best use cases for it? Things can never be too simple; you start from zero, put a lot of different things in there, and then you need the feedback to build in simplicity AWS is following a more on-demand approach than traditional reserve instances; it opens the door to being used in a lot of ways AWS does a lot of work before a launch to make sure it’s got infrastructure, scaling, monitoring, and capacity in place If you are a customer, talk to AWS and let them know what they're doing right or wrong; write a blog post, tweet about it, share it with them in some way Is the breadth of product offerings from AWS too vast? Is it offering too many things? AWS was not explicit about where it was going with Cloud computing or do analyses or projections about it; it simply launched SQS and let it speak for itself Customer feedback shapes what Amazon works on; customers share and then AWS re-prioritizes to make sure it’s delivering the right thing at the right time Remember: It's not just bits and bytes, it's about the organic life form Links: Jeff Barr on Twitter Jeff Barr on LinkedIn AWS AWS Blog Jeff Barr’s Blog Amazon Machine Images Zero Client AWS Workspaces AWS Lambda Amazon Leadership principles re:Invent The Robot Uprising Will Have Very Clean Floors Serverlessly Storing My Dad Jokes in a Dadabase Days Until re:Invent
Dan Slevin reviews the second film by Pasifika writer-director Stallone Vaioga-Ioasa (now known as SQS).
RR 323: Queuing and Amazon SQS with Kinsey Ann Durham This episode of Ruby Rogues features panelists Charles Max Wood, Dave Kimura, and Eric Berry. Special guest Kinsey Ann Durham joins to talk about queuing and Amazon SQS. Tune in to learn more! [00:01:19] Kinsey Ann Durham Kinsey writes code for a company called Go Spot Check. She is always a lead mentor in a San Francisco based company called Bloc. [00:02:50] Background on Amazon SQS Go Spot Check is using Amazon SQS on a smaller scale. Kinsey thinks it is sasy to use. She recommends using something like Amazon SQS or even RabbitMQ. It has provided the company with the ability to explore different architecture patterns and tools. [00:04:50] Can you talk a little about your company and what led to using Amazon SQS? Go Spot Check is a start up in Denver. They focus on recording and data collection for big companies that need to know what is happening in retail, grocery stores, and bars. The focus is on alcohol and retail brands. The company analyzes the data collected that previously held no insight. Go Spot Check is currently moving into a computer vision aspect. Kinsey works off a separate service off of main aspect of Go Spot Check. [00:06:46] What does your stack look like? Is it built off Ruby? Yes, it is a Rails API only. The computer vision is done in Python. [00:08:45] Are you feeding the images through the queue? How does the queuing fit in? Started using Amazon SQS because they wanted to have a more decoupled way of developing. This allowed them to decide the contract between the two services and decide what they wanted it to look like up front. Kinsey describes that it is easy to create fake messages for testing with Amazon SQS. Image data is sent back and forth through the queue. The company does a lot of planograms. Information is taken from that data and posted onto a queue from the machine learning side of things. On the Rail side of things, the data can be picked up in API and sent back to the main app. [00:10:50] Does it accept binary data in the queue? It does not send actual images. All comparison data that has been processed is sent from the machine learning aspect side of things. An article has been published that shows that people do send images in the queue. [00:11:35] Do you use SQS in parallel with SNS (Simple Notification Service)? Kinsey says that they haven’t used SNS. This is because there hasn’t been a need. They are using it to post messages to communicate between different services. [00:12:40] What point would you need to consider a SQS over a Sidekick? Kinsey didn’t look into using Sidekick; she was excited to use SQS. She wanted to try it out and see if it was easy to use. Thought it would be more complex than it has been. She enjoys the free features of Amazon such as message visibility and timeout, which is handled by them. It can be customized and two different queues can be used. [00:16:15] How do you write the workers for an SQS queue? Kinsey has a plain Ruby object in the API that she can reuse with any queue. There are three queues in the company. [00:19:45] Are there any other uses for queues and SQS? Kinsey hasn’t come across any personally but she is sure there are some. [00:23:40] What if you’re someone who is new? Where would you recommend they get started? Suggest getting started with SQS Amazon, SQS documentation. Can get up to speed quickly. Amazon SQS is easy to get up and running. Kinsey is tailoring her Ruby Dev Summit talk to people who are new. [00:30:35] How do you go about mentoring? Kinsey loves mentoring. Developers have side projects or freelance work, but Kinsey likes to mentor because she feels like she makes a difference while continuing to learn. An important part of mentorship is giving support. This support level to students’ means not only offering students help with technical skills. Her goal is to build a well-rounded developer: someone who will be a great team member and people will want to work with in the future. This involves helping students build soft skills such as networking, interviewing skills, and helping them build confidence. [00:33:52] How would people get involved with mentorship? Kinsey is involved with an organization called Bloc - they are always hiring mentors. She shares that people can always get involved in their local community. Schools are looking for mentors. People at local meet ups and Rails Bridge are also both good ways to volunteer. Kinsey learned through mentors - she didn’t go to school to learn code. Mentors changed her life and are important to her, which is why she now mentors. [00:36:30] Advice For Women Kinsey’s advice for women who want to work in the technology world is to go for it. She urges women to get as many people and resources on their side as possible, including great developers who are willing to mentor. She emphasizes the importance of confidence and says to be ready for comments on gender. She believes that - while there are definitely still diversity issues with socioeconomic background, sexual orientation, race, gender, etc. it is getting better – women are more welcome in the technology field than they have previously been. There are technology organizations that are doing well and have no problems with welcoming women into the workplace. People in the field need to be open to having discussions about gender inequality. Open dialogue with team members is the key to solving problems. Some people have grown up not realizing the way they think is wrong. They don’t connect that what they say or think is offensive because it is all they know; it is unconscious to them. This is the type of person that is hard to change. Picks Eric: Open Collective Open Collective – Women Who Work Dave: Health insurance Charles: Profit First Secrets of the Millionaire Mind Kinsey: Guide program applications for mentors at RubyConf Release It Links for Kinsey Twitter Instagram GitHub
RR 323: Queuing and Amazon SQS with Kinsey Ann Durham This episode of Ruby Rogues features panelists Charles Max Wood, Dave Kimura, and Eric Berry. Special guest Kinsey Ann Durham joins to talk about queuing and Amazon SQS. Tune in to learn more! [00:01:19] Kinsey Ann Durham Kinsey writes code for a company called Go Spot Check. She is always a lead mentor in a San Francisco based company called Bloc. [00:02:50] Background on Amazon SQS Go Spot Check is using Amazon SQS on a smaller scale. Kinsey thinks it is sasy to use. She recommends using something like Amazon SQS or even RabbitMQ. It has provided the company with the ability to explore different architecture patterns and tools. [00:04:50] Can you talk a little about your company and what led to using Amazon SQS? Go Spot Check is a start up in Denver. They focus on recording and data collection for big companies that need to know what is happening in retail, grocery stores, and bars. The focus is on alcohol and retail brands. The company analyzes the data collected that previously held no insight. Go Spot Check is currently moving into a computer vision aspect. Kinsey works off a separate service off of main aspect of Go Spot Check. [00:06:46] What does your stack look like? Is it built off Ruby? Yes, it is a Rails API only. The computer vision is done in Python. [00:08:45] Are you feeding the images through the queue? How does the queuing fit in? Started using Amazon SQS because they wanted to have a more decoupled way of developing. This allowed them to decide the contract between the two services and decide what they wanted it to look like up front. Kinsey describes that it is easy to create fake messages for testing with Amazon SQS. Image data is sent back and forth through the queue. The company does a lot of planograms. Information is taken from that data and posted onto a queue from the machine learning side of things. On the Rail side of things, the data can be picked up in API and sent back to the main app. [00:10:50] Does it accept binary data in the queue? It does not send actual images. All comparison data that has been processed is sent from the machine learning aspect side of things. An article has been published that shows that people do send images in the queue. [00:11:35] Do you use SQS in parallel with SNS (Simple Notification Service)? Kinsey says that they haven’t used SNS. This is because there hasn’t been a need. They are using it to post messages to communicate between different services. [00:12:40] What point would you need to consider a SQS over a Sidekick? Kinsey didn’t look into using Sidekick; she was excited to use SQS. She wanted to try it out and see if it was easy to use. Thought it would be more complex than it has been. She enjoys the free features of Amazon such as message visibility and timeout, which is handled by them. It can be customized and two different queues can be used. [00:16:15] How do you write the workers for an SQS queue? Kinsey has a plain Ruby object in the API that she can reuse with any queue. There are three queues in the company. [00:19:45] Are there any other uses for queues and SQS? Kinsey hasn’t come across any personally but she is sure there are some. [00:23:40] What if you’re someone who is new? Where would you recommend they get started? Suggest getting started with SQS Amazon, SQS documentation. Can get up to speed quickly. Amazon SQS is easy to get up and running. Kinsey is tailoring her Ruby Dev Summit talk to people who are new. [00:30:35] How do you go about mentoring? Kinsey loves mentoring. Developers have side projects or freelance work, but Kinsey likes to mentor because she feels like she makes a difference while continuing to learn. An important part of mentorship is giving support. This support level to students’ means not only offering students help with technical skills. Her goal is to build a well-rounded developer: someone who will be a great team member and people will want to work with in the future. This involves helping students build soft skills such as networking, interviewing skills, and helping them build confidence. [00:33:52] How would people get involved with mentorship? Kinsey is involved with an organization called Bloc - they are always hiring mentors. She shares that people can always get involved in their local community. Schools are looking for mentors. People at local meet ups and Rails Bridge are also both good ways to volunteer. Kinsey learned through mentors - she didn’t go to school to learn code. Mentors changed her life and are important to her, which is why she now mentors. [00:36:30] Advice For Women Kinsey’s advice for women who want to work in the technology world is to go for it. She urges women to get as many people and resources on their side as possible, including great developers who are willing to mentor. She emphasizes the importance of confidence and says to be ready for comments on gender. She believes that - while there are definitely still diversity issues with socioeconomic background, sexual orientation, race, gender, etc. it is getting better – women are more welcome in the technology field than they have previously been. There are technology organizations that are doing well and have no problems with welcoming women into the workplace. People in the field need to be open to having discussions about gender inequality. Open dialogue with team members is the key to solving problems. Some people have grown up not realizing the way they think is wrong. They don’t connect that what they say or think is offensive because it is all they know; it is unconscious to them. This is the type of person that is hard to change. Picks Eric: Open Collective Open Collective – Women Who Work Dave: Health insurance Charles: Profit First Secrets of the Millionaire Mind Kinsey: Guide program applications for mentors at RubyConf Release It Links for Kinsey Twitter Instagram GitHub
RR 323: Queuing and Amazon SQS with Kinsey Ann Durham This episode of Ruby Rogues features panelists Charles Max Wood, Dave Kimura, and Eric Berry. Special guest Kinsey Ann Durham joins to talk about queuing and Amazon SQS. Tune in to learn more! [00:01:19] Kinsey Ann Durham Kinsey writes code for a company called Go Spot Check. She is always a lead mentor in a San Francisco based company called Bloc. [00:02:50] Background on Amazon SQS Go Spot Check is using Amazon SQS on a smaller scale. Kinsey thinks it is sasy to use. She recommends using something like Amazon SQS or even RabbitMQ. It has provided the company with the ability to explore different architecture patterns and tools. [00:04:50] Can you talk a little about your company and what led to using Amazon SQS? Go Spot Check is a start up in Denver. They focus on recording and data collection for big companies that need to know what is happening in retail, grocery stores, and bars. The focus is on alcohol and retail brands. The company analyzes the data collected that previously held no insight. Go Spot Check is currently moving into a computer vision aspect. Kinsey works off a separate service off of main aspect of Go Spot Check. [00:06:46] What does your stack look like? Is it built off Ruby? Yes, it is a Rails API only. The computer vision is done in Python. [00:08:45] Are you feeding the images through the queue? How does the queuing fit in? Started using Amazon SQS because they wanted to have a more decoupled way of developing. This allowed them to decide the contract between the two services and decide what they wanted it to look like up front. Kinsey describes that it is easy to create fake messages for testing with Amazon SQS. Image data is sent back and forth through the queue. The company does a lot of planograms. Information is taken from that data and posted onto a queue from the machine learning side of things. On the Rail side of things, the data can be picked up in API and sent back to the main app. [00:10:50] Does it accept binary data in the queue? It does not send actual images. All comparison data that has been processed is sent from the machine learning aspect side of things. An article has been published that shows that people do send images in the queue. [00:11:35] Do you use SQS in parallel with SNS (Simple Notification Service)? Kinsey says that they haven’t used SNS. This is because there hasn’t been a need. They are using it to post messages to communicate between different services. [00:12:40] What point would you need to consider a SQS over a Sidekick? Kinsey didn’t look into using Sidekick; she was excited to use SQS. She wanted to try it out and see if it was easy to use. Thought it would be more complex than it has been. She enjoys the free features of Amazon such as message visibility and timeout, which is handled by them. It can be customized and two different queues can be used. [00:16:15] How do you write the workers for an SQS queue? Kinsey has a plain Ruby object in the API that she can reuse with any queue. There are three queues in the company. [00:19:45] Are there any other uses for queues and SQS? Kinsey hasn’t come across any personally but she is sure there are some. [00:23:40] What if you’re someone who is new? Where would you recommend they get started? Suggest getting started with SQS Amazon, SQS documentation. Can get up to speed quickly. Amazon SQS is easy to get up and running. Kinsey is tailoring her Ruby Dev Summit talk to people who are new. [00:30:35] How do you go about mentoring? Kinsey loves mentoring. Developers have side projects or freelance work, but Kinsey likes to mentor because she feels like she makes a difference while continuing to learn. An important part of mentorship is giving support. This support level to students’ means not only offering students help with technical skills. Her goal is to build a well-rounded developer: someone who will be a great team member and people will want to work with in the future. This involves helping students build soft skills such as networking, interviewing skills, and helping them build confidence. [00:33:52] How would people get involved with mentorship? Kinsey is involved with an organization called Bloc - they are always hiring mentors. She shares that people can always get involved in their local community. Schools are looking for mentors. People at local meet ups and Rails Bridge are also both good ways to volunteer. Kinsey learned through mentors - she didn’t go to school to learn code. Mentors changed her life and are important to her, which is why she now mentors. [00:36:30] Advice For Women Kinsey’s advice for women who want to work in the technology world is to go for it. She urges women to get as many people and resources on their side as possible, including great developers who are willing to mentor. She emphasizes the importance of confidence and says to be ready for comments on gender. She believes that - while there are definitely still diversity issues with socioeconomic background, sexual orientation, race, gender, etc. it is getting better – women are more welcome in the technology field than they have previously been. There are technology organizations that are doing well and have no problems with welcoming women into the workplace. People in the field need to be open to having discussions about gender inequality. Open dialogue with team members is the key to solving problems. Some people have grown up not realizing the way they think is wrong. They don’t connect that what they say or think is offensive because it is all they know; it is unconscious to them. This is the type of person that is hard to change. Picks Eric: Open Collective Open Collective – Women Who Work Dave: Health insurance Charles: Profit First Secrets of the Millionaire Mind Kinsey: Guide program applications for mentors at RubyConf Release It Links for Kinsey Twitter Instagram GitHub
Enterprises rely on messaging to integrate services and applications and to exchange information critical to running their business. However, managing and operating dedicated message-oriented middleware and underlying infrastructure creates costly overhead and can compromise reliability. In this session, enterprise architects and developers learn how to improve scalability, availability, and operational efficiency by migrating on-premises messaging middleware to a managed cloud service using Amazon SQS. Hear how Capital One is using SQS to migrate several core banking applications to the cloud to ensure high availability and cost efficiency. We also share some exciting new SQS features that allow even more workloads to take advantage of the cloud.
#vBrownBag US - AWS SA Associate - Domain 1 Part 5 with Alastair Cooke @DemitasseNZ Objectives - Elasticity and scalability â?? Auto Scaling â?? SQS â?? Elastic Load Balancer (ELB) â?? Cloud Front
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
We compare the asymptotic covariance matrix of the ML estimator in a nonlinear measurement error model to the asymptotic covariance matrices of the CS and SQS estimators studied in Kukush et al (2002). For small measurement error variances they are equal up to the order of the measurement error variance and thus nearly equally efficient.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
In a structural error model the structural quasi score (SQS) estimator is based on the distribution of the latent regressor variable. If this distribution is misspecified the SQS estimator is (asymptotically) biased. Two types of misspecification are considered. Both assume that the statistician erroneously adopts a normal distribution as his model for the regressor distribution. In the first type of misspecification the true model consists of a mixture of normal distributions which cluster round a single normal distribution, in the second type the true distribution is a normal distribution admixed with a second normal distribution of low weight. In both cases of misspecification the bias, of course, tends to zero when the size of misspecification tends to zero. However, in the first case the bias goes to zero very fast so that small deviations from the true model lead only to a negligible bias, whereas in the second case the bias is noticeable even for small deviations from the true model.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
A nonlinear structural errors-in-variables model is investigated, where the response variable has a density belonging to an exponential family and the error-prone covariate follows a Gaussian distribution. Assuming the error variance to be known, we consider two consistent estimators in addition to the naive estimator. We compare their relative efficiencies by means of their asymptotic covariance matrices for small error variances. The structural quasi score (SQS) estimator is based on a quasi score function, which is constructed from a conditional mean-variance model. Consistency and asymptotic normality of this estimator is proved. The corrected score (CS) estimator is based on an error-corrected likelihood score function. For small error variances the SQS and CS estimators are approximately equally efficient. The polynomial model and the Poisson regression model are explored in greater detail.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
We consider two consistent estimators for the parameters of the linear predictor in the Poisson regression model, where the covariate is measured with errors. The measurement errors are assumed to be normally distributed with known error variance sigma_u^2. The SQS estimator, based on a conditional mean-variance model, takes the distribution of the latent covariate into account, and this is here assumed to be a normal distribution. The CS estimator, based on a corrected score function, does not use the distribution of the latent covariate. Nevertheless, for small sigma_u^2, both estimators have identical asymptotic covariance matrices up to the order of sigma_u^2. We also compare the consistent estimators to the naive estimator, which is based on replacing the latent covariate with its (erroneously) measured counterpart. The naive estimator is biased, but has a smaller covariance matrix than the consistent estimators (at least up to the order of sigma_u^2.)