POPULARITY
BONUS: What the UK COVID App Project Taught Us About Remote Agile Collaboration: A Conversation with Giovanni Asproni NOTE: We want to thank the folks at Tuple.app for being so generous with their stories, and supporting the podcast. Visit tuple.app/scrum and share them if you find the app useful! Remember, sharing is caring! In this special BONUS episode, Giovanni Asproni, CTO and co-founder of Launch Ventures, takes us behind the scenes of his work on the UK government's COVID-19 app project. Giovanni shares insights into the rapid development process, the adoption of remote pairing and ensemble programming, and valuable lessons on leadership in large-scale, high-stakes projects. Giovanni also offers practical advice for teams embracing remote collaboration and agile methodologies. The Start of a Crucial Project "When we got the call from the UK government, we knew this was a mission to help stop the spread of the virus." Giovanni kicks off the episode by recounting how his team at Zühlke Engineering got involved in the development of the UK's COVID-19 app. Tasked with the challenge of building a solution that could help prevent the spread of the virus, they were under immense pressure to deliver quickly. Giovanni explains how they tackled technical hurdles, such as using Bluetooth technology to assess contagion risks, despite the lack of existing APIs on iOS and Android at the time. "Speed was essential, but we also needed a robust design—Bluetooth was key to evaluating contagion risks, even though we didn't have the APIs we needed." Overcoming Rapid Release Challenges "We had to move fast, but accessibility and coordination were non-negotiable." Giovanni discusses how the team, which consisted of around 60-70 members spread across the globe, used agile methodologies to stay organized and on schedule. By fostering open communication and using a clear team structure, they were able to streamline development. Agile planning and strong leadership, including cross-team coordination, were crucial to staying on track. "Agile was our backbone—every team knew their responsibility, and clear communication meant we could deploy with confidence." The Power of Remote Pairing and Ensemble Programming "Pairing allowed us to maintain quality under immense pressure." Giovanni dives deep into the practices of remote pairing and ensemble programming (or mobbing), which were introduced to enhance code quality and resilience during the project. With team members working remotely and under heavy scrutiny, mobbing provided a social outlet and improved problem-solving, while tools like Tuple made remote collaboration seamless. He reflects on the success of these practices, highlighting their impact on efficiency and team morale. "We embraced mobbing not just for resilience, but to stay connected in a time when social contact was scarce." Advice for Remote Pairing Beginners "Don't overthink it—just start and take breaks!" For teams new to remote pairing or ensemble programming, Giovanni offers simple but effective advice: give it a serious try, take breaks to avoid burnout, and don't overcomplicate the process. He emphasizes that these practices can dramatically improve productivity and team cohesion if executed well. "Take the plunge—remote pairing can feel awkward at first, but the benefits are worth it." Key Lessons on Remote Work and Collaboration "Don't try to recreate the office—remote work offers unique advantages." Reflecting on the lessons from the COVID-19 app project, Giovanni explains how remote work is not just a substitute for office work but an entirely different mode of collaboration. He warns against trying to replicate office dynamics remotely, and instead, encourages teams to embrace the benefits of remote settings, such as easier scheduling and fewer distractions from management oversight. "Remote work isn't about replicating the office—when done right, it's a whole new way to collaborate." Resources for Learning More "Explore the power of mob programming with these great resources." To wrap up, Giovanni shares a few key resources for listeners who want to dive deeper into remote pairing, ensemble programming, or leadership in software engineering. He recommends "Software Teaming" by Woody Zuill and the Remote Mob Programming website, which offers comprehensive guides and tools. You can also find out more about Giovanni's work at his company website: https://www.asprotunity.com. During the episode, Giovanni mentions a network of consultants, which you can access at: https://www.clockwork.ing. And the podcast Giovanni hosts is the Software Engineering Radio podcast. About Giovanni Asproni Giovanni Asproni is a consultant, CTO, and co-founder of Launch Ventures. He is an expert in agile development, software design, and modern software engineering practices. Giovanni is a host for the Software Engineering Radio podcast and a frequent speaker at international conferences. You can link with Giovanni Asproni on LinkedIn.
Dr. Daniel Zingaro and Dr. Leo Porter are co-authors of the book Learn AI-Assisted Python Programming. Leo will teach an introductory computer science course this quarter at UCSD using this book. We discuss how tools like GitHub Copilot let people new to programming focus on breaking down problems instead of language syntax. Dr. Zingaro is an Associate Professor of Computer Science at University of Toronto Mississauga and Dr. Porter is an Associate Professor at University of California San Diego. This episode was originally posted on Software Engineering Radio. Topics covered: Making programming more accessible Teaching problem decomposition instead of language syntax The importance of reading and testing untrusted generated code The rise of throwaway or one-off code Concerns about relying on commercial tools Rethinking how to assess students Related Links Learn AI-Assisted Python Programming Leo Porter Daniel Zingaro GitHub Copilot Transcript You can help edit this transcript on GitHub. Note the timestamps and audio for this transcript will not completely match. Intro [00:00:00] Jeremy: Today I'm talking to Dr. Leo Porter. He's an associate teaching professor of computer science at the University of California San Diego, and he co-founded the computing education research laboratory there. I'm also joined by Dr. Daniel Zingaro who is an associate teaching professor of computer science at the University of Toronto. And he's also the author of the book, learn to Code by Solving Problems and the Book, Algorithmic Thinking. They are co-authors of the book, learn AI Assisted Python programming. Leo and Dan, welcome to Software Engineering Radio. [00:00:37] Leo: Thank you for having us, Jeremy. I really appreciate your podcast, so thanks. Great to be here. [00:00:41] Dan: Thanks Jeremy. Writing a book for Leo's CS1 class [00:00:43] Jeremy: The first thing we could start with is, is why this book? And, and why now? How did you decide on like, okay, this is the thing we need to do now. [00:00:51] Leo: So, uh, this is Dan. Uh, so Dan, um, like really early when LLMs first kind of were coming out and being seen on the scene for programming, uh, he started playing with them, uh, for programming projects. And I think Dan really quickly realized that they'd had this, a big impact on how we teach programming. so he reached out to me, uh, and said, I really need to give em a try. And, uh, after I played with them for a little while, I had the exact same realization that this is gonna change, uh, how we teach programming, uh, in a pretty dramatic way. So having realized that, having realized that we had to change our, uh, introductory CS1 courses, we knew we needed to do that, but in order to teach that class, we'd have to have a book that we could assign our students that that would go along with the class. And so we knew we had to change the class, but we also knew we had to have a book for it. And given the, the timeline to write books, we started in the book first. Um, and so that's how it got started. LLMs for Syntax, Humans for breaking down problems [00:01:45] Dan: I guess we figured out that our course had to change first, before we knew exactly, um, how it had to change. One thing we, um, learned early on was that the kinds of assignments we give in our introductory courses, they're just solved by, by these tools like ChatGPT and copilot. So, uh, we knew something had to change, and then it is just a matter of figuring out what. And so we spent, um, quite a bit of time with these tools and we started to realize that what's gonna change is the skills that our students need to learn, uh, to be effective using these tools. So like b before these tools, we would spend a lot of time teaching syntax. Um, and students struggle quite a bit with learning syntax, which I mean, it's very, it's, it's very frustrating, right? Cuz you can't even do anything until you get the syntax right? And you're getting all these errors like missing colons and, you know, mismatched braces and stuff like that. Uh, so it's actually good, that, the LLMs are doing the syntax for the students. But you know, just because that skill's, uh, not needed as much, uh, doesn't mean that there aren't still skills for students to learn. So instead of syntax, other things become more important. Uh, so for example, uh, Leo and I, realize that reading code is gonna be extremely important even more so than before. I think if, if that, if that's even possible. Uh, and that's because sometimes you're gonna get back code that just doesn't work. And so we realized that students are gonna need to be able to read, the response that they get to see if the code looks reasonable, or not, right? And then if the code, uh, I is unreasonable, then they need to read more code, uh, and look at other solutions, right, that they get from the, uh, LLM. Uh, there are other, uh, things they can do as well, like messing around with the prompt and so on. But they're gonna need to be able to read code, uh, throughout the process. And then, so we just kind of kept on using these tools and documenting the skills that students are gonna need. And we just kinda realized that all the skills students are gonna need are skills we would want to teach anyway. So like, uh, one more example is testing, right? So, students may now not have, uh, an understanding of every last detail of, you know, the Python language like they would before. And so then that makes testing even more important, right? Than it was they need to verify that the code they're getting is correct. And so they have to be very good at writing test cases. and, and, you know, similar, similar for debugging, we need our students to have strong debugging skills, again, even potentially stronger than before, right? Because if the code isn't working, they need to first determine what the code is doing to be able to fix it. And then I guess one more I'll mention is problem decomposition. And this is a big one. I think this is gonna come up a couple times probably in our talk today, but LLMs struggle when you give them tasks that are too large and students need to know how to break problems down into small components so that, that, LLM can solve each one and, you know, have a good chance of getting it right. [00:04:56] Leo: Yeah, I, I think, um, kind of to, to piggyback off of that, you, you may be hearing these skills and saying, oh, these are absolutely essential skills. Every software engineer should know, uh, these are being taught right now. Right? Um, and the answer is not really, like these aren't core topics in a lot of introductory CS classes because so much time is spent on syntax. And so fairly early on when we kind of realized these skills would be so essential, Uh, we got really excited because these are skills we want to teach in our classes, and the LLMs are now giving us the ability to do that more. [00:05:27] Dan: Mm-hmm. [00:05:28] Jeremy: I think that's interesting about the syntax comment because you were saying how reading is gonna be more important than ever because you have LLM generating the code. Um, and you need to understand that code that's being generated and understand that it does what it, uh, you think it does. And so I wonder if when you say you spend less time on syntax, is it because you feel like they're gonna generate this code and they're sort of organically gonna pick up syntax that way versus having to focus on it at the start? I'm just trying to picture what you see changing there. [00:06:05] Dan: Yeah, Jeremy. So, uh, I, I was, I guess speaking specifically about syntax errors, which don't generally happen when you're using LLMs, and I also agree with you, you need to know what the code is doing, but, um, you can do that without worrying about each specific piece of syntax. Like, um, you're gonna need to know what the keywords do for sure, but, missing, you know, brackets and colons and, uh, oh, there needs to be like a blank line here. indentation, uh, a lot of this kind of thing. Is done for the most part, correctly by the LLMs. So yeah, I agree with you. You need to be able to identify the structures. So in our, in our book actually, Leo and I have, um, a couple of chapters on reading code and, I don't think we ever break breakdown, a line of code into its individual tokens. We do talk about the main structures, like ifs and loops and functions and all that. but compared to other books, I, I think or other, uh, other ways of teaching where you would focus on the micro level, we try to focus on the line level now, cuz we want our students to be able to grasp what each line is doing, I guess more than each token. [00:07:27] Leo: Yeah, maybe to, to add to that a bit, it's almost, uh, if you think about the advent of block-based languages, it was to make sure that the, essentially the, the author can't make syntax mistakes, right? Is the whole purpose of kind of block-based languages. And they're, they're huge for introductory programming, especially in like K through 12. in a sense, LLMs do this because they'd never give you back wrong syntax, or they almost, almost never give you back wrong syntax. And so it takes away that kind of cognitive burden of making sure you handle the, the token level. as uh Dan was saying LLM generated code needs test cases to catch logical errors [00:08:00] Jeremy: I, I'm curious, so you said the syntax is correct, but what are the, the typical mistakes you see coming back from these LLMs? Is it a, a logical mistake or is it ever something that. Actually doesn't compile. I'm, I'm kind of curious what your experience has been. [00:08:19] Leo: I think the, uh, more common errors that we've been seeing are logical. So it misinterprets the prompt that you're giving it. It essentially tries to solve a problem that's different than what you're trying to solve. It may have bugs in it, so it is in fact trying to solve the right problem, but it, it's off by one, um, is maybe replicating some mistake that it found in, in the large code base. And so most mistakes are gonna be you need to write test cases, run it. That mistake is then gonna show up when the test cases catch it, and then you'll have to try to fix it. if the students can read the code, uh, if we train them well to read the code, often you'll look at the response. And if the response is just not even trying to solve the right problem, you can usually pick that up pretty quick. Uh, and I think, I think the students will be learn to do that and then they can just say, okay, this is clearly not the right answer. And, and use the different tools in say vscode to find another answer, and then pick one that's right or change their prompt to get a response that's right. Go through that whole flow. But then some point or other it will give an answer that looks right. And then I think all of us as software engineers know that even the code looks right, it may not be. And so then they have to actually write the test cases, get some level of confidence that's actually working right before they'll know. And so sometimes, sometimes, you know, really quick is that it's just clearly wrong at solving the wrong problem. And sometimes it looks right, but it actually has some bugs that need to be fixed. [00:09:49] Dan: I guess one thing that struck me is how much a change in the prompt can, can matter. Uh, Leo, you know, um, we've, we've seen this over and over again where we'll write a prompt. It seems fine to us. And then we'll realize, oh, there are actually two different ways of interpreting this. and, uh, the ambiguity of, of English strikes again, right? And so it's just amazing to me how clarifying the prompts, how many times that fixes the code. Not always. We've definitely have examples where that's not the case, but, um, more, more often than not, in my experience, changing the prompt, uh, appropriately has a bigger than, than, um, anticipated effect on the, on the code. It's amazing. [00:10:36] Leo: And for thinking of the prompt, uh, in terms of like doc strings for functions, uh, adding the test cases certainly help. Um, sometimes it is, surprising sometimes that you can add the test cases to the prompt and it'll still give you back code that does not actually pass that test case because it, vscode and copilot doesn't actually run the code that comes back from the LLM. Uh, but I do find the test cases do tend to help with the quality response you get back. [00:11:01] Jeremy: As a part of your prompt, you're asking it to implement some functionality, and you're also asking it to write these tests for that same functionality? [00:11:11] Leo: Oh no, sorry. I, I, it's more the, um, doc test kind of format. So it, it, um, you're writing, let's say you, you've written your function signature and then you have the description of the function in a doc string. And then at towards the end of the doc string, I'm articulating the test cases that I intend to use. Um, and the articulating the test cases that I intend to use helps it come with a better prompt. Um, I haven't found it to be great at writing test cases. I haven't spent a ton of time with this, but the time that I have spent, it tends to want to do almost like a brute force search of all possible inputs, uh, as opposed to doing, okay, well here's a couple common. Here are the edge cases. Now I can feel fairly good about it. It doesn't seem to have that, um, intuition yet. [00:11:55] Jeremy: [00:11:55] Leo: For the most part, we're writing the test cases our ourselves, and we're gonna be teaching the students how to write the test cases themselves [00:12:01] Dan: Yeah, Yeah. So Leo and I have actually made a conscious decision to have students write test cases from scratch. Even though you could play around with the LLM and have it, you know, try to generate test cases, whether it's flawed or not, we still want students to do this from scratch. We think that writing test cases is a skill we want our students to have. [00:12:23] Jeremy: Sometimes what these models will generate, like you were saying, has logical errors. And hopefully if you're writing the test cases, you've put some thought into 'em, and your test cases are actually checking the correct behavior. So then you have the LLM generate the implementation. It's running against tests where you know what the correct answer should be. And so if it generates something that's incorrect, you've, you've kind of caught it. You're not totally relying on it. Telling you everything is, is good, you know? Um, It's confidence in something that's like you personally can't see. It's just what the machine gave you. [00:13:05] Dan: Maybe it takes away one layer of uncertainty too, Jeremy, right? Like, so the code could be wrong, right? And then if it generates test cases, okay, the test cases could be wrong too. And maybe you get unlucky and two wrongs make a right and then your test cases pass for the wrong reason. So yeah, we really wanna hone this skill in our students. And, and like Leo said earlier, these intro courses used to be so full of low level syntax concerns that we, we didn't do testing properly. I mean, you know, we all try to cover testing, but I think we're gonna be able to cover it a lot more, detailed now. LLMs could encourage students to test more since their output is untrusted [00:13:41] Leo: And I, I think we're enthusiastic about, uh, how students will approach testing when you're working with the LLM is what we. This is fairly anecdotal, but uh, when they interact with us talking about testing, often students aren't testing their code because they wrote it. And so of course it's Right. Right. This is like this really famous, uh, kind of bug in human thinking, right? Is that if you write it, of course the computer's gonna interpret what you're saying, right? Um, and so students tend to trust their code in a way that professional software engineers never would. and I think because it's coming from this third party that you know is wrong, it's coming from the LLM that can, that can often make mistakes. I think they're gonna be more inclined to actually engage in those testing practices. Uh, kind of knowing about the fallibility of the LLM, [00:14:27] Jeremy: You're shifting the order. I mean, there is test driven development that some people practice, but I feel like probably what's most common is you write the implementation yourself and then, then you'll go and see like, oh, did this thing I, I wrote. Did it do what I thought it should do? Um, whereas this is kind of flipping it, where it's the large language model is gonna write my code, so I'm just gonna start with the test and then I'll ask it to, to write me the code. And maybe that will kind of make test driven development be the default. [00:15:02] Leo: So yeah, I, I, I think that students may wanna engage more in kind of test driven development because they wanna think more about, uh, what exactly should this function be doing? Uh, how should behave, what kind of inputs and output should it expect? And then it can kind of write the prompt to co-pilot or whatever LLM is using, uh, to express those inputs and outputs. Well, they're more apt to get good answer from the LLM and they've kind already got their test cases worked out as well, so they can immediately just go right into the testing agency if the prompt came back right. Using LLMs at the function level instead of a broader scope [00:15:35] Jeremy: And you mentioned writing a prompt to implement a specific function. Have you found that they work well at the function level? But if you try to ask it to build something more broad, that that's kind of when it has problems? [00:15:53] Dan: So, I think in general, LLMs do work best at the function level. We have tried to get it to generate bigger apps, collections of functions, and it can work, but sometimes it does, uh, it does do worse. But also we want students to do the problem decomposition for themselves and break up the problem into individual functions. Even though maybe the LLM could work, uh, with, uh, bigger chunks of code, we want students to do it. And one reason is so that they can customize what they get from the LLM. So, in the book, we have a bunch of examples where you could probably just throw it at the LLM and get an answer and, you know, eventually get it to work. But I think at that point, making changes to it might be trickier than it would be if you knew, uh, the architecture of what you were, what you were building. So in the book, we have a bunch of top-down design diagrams, and we want students to understand what they're building at that level, like at the function level instead of, like we said earlier, instead of like at the token level or the line level. Potential issues with outsourcing high level design to an LLM [00:17:03] Jeremy: And so like in this example, you're thinking more from a, a learning perspective. You want the student to look at the big picture, figure out, okay, what are all the different functions or parts of my application? Break that down and then feed those individually. To, um, these large language models. I, I'm wondering from like, let's say you're a, a professional software engineer and your interest is more in I want to make the thing and less so, in I want to learn how to make the thing. in that case, do you feel like you could feel confident in, in giving the large language model a larger piece of the design, or do you still feel like it's good to have that overall structure done by the, the developer and then just be very targeted about how you use the large language model? [00:18:03] Leo: I think that's a tricky question because we haven't worked with these tools heavily in a professional programming setting. I think often when we're thinking about large design of software, you're gonna be working on teams, talking with other members of the team about the interfaces and things like that. And so I'd be pretty hesitant to to outsource that, that thinking to the, the l lm cuz you, the communication between the teams still has to happen. Uh, even if it weren't for that. Um, I kinda think of it as a probabilities. So essentially whenever you ask co copilot or any of these LMS to, to do a task, the more it has to right, get the kind of more likely it's gonna make a mistake. Um, and so, uh, that's kind of why I like the functional level. It seems like I. Partially because it's not that much code that tends to write. Um, so you help to avoid kinda the probabilistic problem, but also because it's learned on a huge code base that has lots and lots of functions that have been implemented. It tends to do well at that, that solving the function kind of task. [00:19:10] Jeremy: Yeah. And I, I think the way you put it as outsourcing that designer, that decision is, is interesting because yeah, if you are working on a team and whether it's in code review or just in a discussion, often people will ask, well, well, why did you do it this way? Or Why, why is this the, you know, the good way to design it? And if you kind of handed that off to an l l m, maybe your answer is, I don't know. It's just what it it told me, which (laughs) [00:19:39] Dan: Yeah. [00:19:42] Leo: That isn't an answer I want to u use talking to my boss. Right. Well the chat GPT told me I should have it this way. That doesn't seem like a good answer. Choosing GitHub Copilot for CS1 [00:19:50] Jeremy: I think we, we've kind of been talking in more a general sense of working with LLMs and you've mentioned how you're gonna be teaching introductory computer science courses this coming, quarter or semester. And so when you teach these classes, what tools are you gonna recommend your students use? And yeah, maybe you could go into that a bit. [00:20:13] Leo: Absolutely. So we're gonna be recommending, um, At least, at least for my class, I'm gonna be recommending that they use, uh, vs code with copilot. Um, I just like the integration of the IDE with the, uh, interactions with the LLM uh, I think it avoids just a whole bunch of copy pasting from another interface into your IDE to then, uh, run it. I think it also reduces the barrier of them kinda immediately getting the code and then testing it right there in the environment. I'm sure any of the other tools would work, it's just, that seems to have worked well for us, uh, when we were writing the book. And that's, that's actually the technique we recommend in the book as well. Um, so that would be the primary tool for the students writing the code. In addition to having them using copilot with, uh, in the IDE for a lot of the code generation, depending on where things are at with copilot x, um, which is right now, um, available through wait list. Uh, if that's, that's available publicly, I think we're gonna be recommending that because it has a copilot chat feature, uh, which can be really nice to interact with. And, uh, the main use that, that we're gonna be encouraging students to use, whether it be co-pilot chat or a ChatGPT is in just a conversation with the LLM about, particularly modules and libraries. So if you are diving into, merging PDFs, which, uh, Dan did a great job in one of the chapters in our book talking about, if you wanna dive into that, well, what libraries should we be using in Python for that. Uh, and we found that the LLMs do a really good job at this, of actually saying, here are the different libraries you could use. Here are the pros and cons of them. These are the ones that, uh, need to be actually have additional install done. Or these ones that come in with, vanilla Python. they're actually really good at kind of giving you the what you should use for the various libraries. Um, and so that's, that's one other way that we were gonna be encouraging the students to use the LLM. Types of questions to ask the LLM [00:22:07] Dan: Yeah. So whenever the students or the junior programmer, doesn't know how or doesn't think they can, uh, do something in base Python, we have them interact with the chat and, and ask. So another example that comes to mind from the book is we have a chapter writing some games. And so for most games, including the two that, uh, we've got in the book, you need to be able to generate random numbers, right? So how do you do that? And so in the past you would've used a search engine stack overflow or something, and you would've found, some sample code and you would've pasted it in to your file and made variable name changes and things like that. And so what we do now is we ask chat, okay, I need to generate some random numbers. How do I do it? And then it will come back to you with a few options, and then you can systematically work through those options if you like. Uh, and you can ask, okay, is this one built into Python or not? And then it will tell you, oh, this one's not. We don't need to memorize API docs [00:23:11] Dan: And you say, oh, well, okay, so like, how do I install this? And then no, does it work on all OSS or just Windows? Right? So, uh, we guide the reader through these questions that you could have, uh, to help you make a decision. Um, and I think what I like the most about this is not having to learn. APIs, like yet another api. Like I don't, I don't think I have room, you know, in my, like, brain for any more APIs. And, and what's cool is I, I've forgotten like every API that, uh, we've used in the book. So we have like examples of emerging PDFs and, uh, removing duplicate images from directories, uh, from like people's phones, and, and stuff like that. And I don't know, I don't know which library it's using. Uh, and I'm, I'm totally okay with that, right? Like I just, I, I wanted to get the job done. I wanted to write a tool, and the tool got written and it used some sort of library and it worked great. And I didn't have to look through the documentation for that library and figure out like, which functions do I have to call and things like that. So, I, I know it, it can be fun, you know, it could be fun to really learn an API well, but a lot of people, they don't want to program for programming sake. Like, they just wanna get work done, right? So, you know, while I, I, I fully admit to, enjoying programming just for the sake of programming. I do a lot of competitive programming problems just for fun. You know, it's like Sunday morning and it's like, Hey, yeah, I got like an hour and I got an hour to work on something. Let me work on this little competitive programming problem. But, uh, a lot of people, they're not motivated by that. They're motivated by consequences of code. And this is one thing about LLMs that I'm very excited about, is you can just, make a lot more progress, without having to learn what these, people may believe is just useless knowledge, right? Like, does it really matter how I should invoke this api Right, to merge PDF files? I mean, the answer for many people is no. Like, they just want the result to happen. And I love how we can kinda match what they, uh, deem important, right? With the LLMs, it's like a new level of abstraction, for for many people. LLMs make building software possible for more people [00:25:28] Leo: There's a couple of audiences that come to our introductory classes, and what Dan's talking about here is one of the things I'm most excited about with this, and that's the students who come and take just one. Programming class. I know it's probably a different audience than, uh, a lot of the people listening right now. Um, but the people who just take one programming class, it's required for, for their major. They, I just wanted to explore it a little bit, but they, they don't go into this as a, as a career. I think a lot of those students right now, uh, if you ask them a year later to program something, do any of these tasks that we're talking about right now, I doubt they're able to, even if they did really well in that class. Uh, and that's really disappointing, right? If they've taken a programming class, they should be able to, to do something with that, a year or even five years later. And I really believe that if you teach them the skills of interacting with these LLMs, they'll be able to do these tasks later. They'll be able to come back and go, you know, I don't remember any of the Python syntax. I don't remember, uh, even how to get started with this. But you know what, I'm just gonna ask, uh, copilot, how do, how do I go about merging these PDFs, having this directory? And then, uh, the copilot chat comes back and says, oh, you might use this and that. And then they go, oh, I remember, I remember how to, how to write these functions. And I just said, you have to go over a prompt. I think they could really do it. And that, that's a bit of a game changer, right? That means a larger portion of our society will be able to, uh, write code and using a useful way. And I'm just really excited about that. I think it's gonna be really nice, uh, after the changes happen. More people might stick with Computer Science [00:26:58] Jeremy: I can totally see in the context of someone who's, not seeing it as a career, or someone who is like, hasn't done it in a while. It could be. These tools can be incredibly useful, right? Or it can even get you interested in this field at all, right? Like a lot of people, they, they struggle through the syntax and then they decide like, oh, this is not for me. Even though like they had something really cool they wanted to build and, and maybe these kind of tools can, can get them over that hump. [00:27:31] Leo: Exactly. I think there's a population of students, um, and it varies a bit by demographics, who come to computer science, with really the best motives in mind, right? They wanna make their goals in their life are to make the world a better place, and they want to achieve those goals. And if you spend the first three quarters or three semesters working with them and all they're seeing is syntax and they're not actually solving anything meaningful, um, it starts to create this disconnect of what their goals are for their life and what they think the goals of are, are career are. Of course as, as, as a computer science, I wanna say, stick it out. You know, if you, if you go into the fourth, fifth class, you'll start seeing how these are really useful tools that can make society a better place. But it'd be really nice to front load that and have them solving useful problems much earlier and seeing that, uh, computer science, uh, can be used in really nice ways. Efficency can be taught later [00:28:26] Jeremy: And, and so within the, the context of. People who are studying computer science will eventually, who may become professional software developers, things like that. Something more long term where it becomes more of a craft, the, the code that comes back from these large language models. Sometimes it could be something that's like not maybe the most easy to read or it may be doing something inefficiently. And I'm wondering from your perspective how users of these tools should, should think about that and, and recognize when that's a problem. [00:29:06] Dan: We in, in, in the first couple of courses, typically in the CS program, um, we don't spend much time on efficiency. the reason is that there's just so much to learn early on, and, um, we worry about overwhelming people with, know, too much, for them to, to process it at once. And we don't wanna prevent students from becoming interested, by. Giving them all of these requirements early on. So typically we, you know, we push efficiency, down the, down the road into like a data structures course, for example. But your question points to another reason why, we've decided to teach some of the skills we teach early on. So if, if a student, you know, came up to Leo or, or me and said, Hey, you know, like I wanna generate efficient code, how do I do it? My answer would, would be, so like, get, get familiar with programming first, but you are learning the skills necessary where you'll be able to look at code later because you know how to read it still, right? It's not, uh, something that you don't understand. You're gonna, you're gonna know it. We're gonna spend lots of time on code reading, and so later I think we can just teach efficiency the way we always did. Um, so, you know, doing, uh, time complexity analysis on, on the code and they're still gonna understand what the code is doing. So, um, I, I, I don't think this is going to, this is going to change much in, in the earliest courses. LLMs can expose students to different types of code [00:30:35] Leo: To the, to the point about code readability, I might add that, uh, certainly they're gonna get back some, some code that's maybe not the best style and it may not be as readable. Uh, but what's kinda interesting is that students aren't exposed to a lot of different styles kind of in our existing courses, right? They, they see the code that they write and they see the code that the professor writes and gives them, and there's not much else. And so, I mean, we're gonna need data and we're gonna need research to, to, to know this for sure, but it, it, I suspect them seeing lots of different code styles and having to read those different code styles may actually inform them better than we do now about what makes code more readable. Uh, and then they might be able to employ that as they go forward. [00:31:21] Jeremy: And, and when you're saying they're gonna read different styles and things like that, are you referring to code they're gonna see from the LLM or are you talking about them reading just other code bases in their classes or their professional work? [00:31:39] Leo: Oh, I'm sorry. Yeah, I was referring to the code. They'll see from the LLM Right [00:31:43] Jeremy: Oh I see [00:31:43] Leo: LLM will come back in all these different ways. They'll have different styles and they'll, uh, have different approaches to solving it. Right? Sometimes they'll, uh, come back with like this one line Lambda expression thing that solves it, and they'll have no idea how that works. And they'll, they'll ask for a different answer and they'll get, uh, a much more, uh, user-friendly first, uh, first programing experience kind of code back. And they'll be able to understand that and go, okay, this is the kind of code that I wanna see. Not this thing that was completely non-readable. [00:32:11] Dan: Yeah, Leo, I just thought of something. So, uh, so you know, by default you can get it to give you 10, uh, code segments to solve the problem, right? So it'd be kind of cool, if we ask students about each of them, right? Each of the 10, which ones are right, which ones have bugs, which ones have good style, which ones have bad style, it's like a built-in learning opportunity right there. So yeah. [00:32:34] Leo: Oh, it's true. Yeah. And, and so the 10 things that, uh, Dan I was referring to is if you do control, enter in vs code when you're working with a copilot, it'll give you back 10. Possible responses. And you're totally right Dan. You could just say of these 10, how readable are they? Are they right? Um, there's lots of fun things you can do to ask students questions. [00:32:51] Dan: and often many of them are right with just subtly different ways of, of, of, of solving the problem. I mean, I'll, I'll admit to having some fun looking through all of the suggestions just to kind of see what the variability is and when there's a lot of variability. I really like it because, uh, like Leo said, it exposes people to different styles they may not have seen before. And, um, may it may, it may, um, encourage you to ask questions, right? Like, why does this one work? Right? I've tested it. It doesn't look like it should work. Why does it work? I feel like that's the beginning of a pr pretty powerful learning experience right there. [00:33:30] Jeremy: Yeah, that makes sense to me because I, I think about how when a lot of people are doing software development before all these LLMs, they will search on the internet and go, okay, what's an existing answer for this thing I'm trying to do? They'll find a post on Stack Overflow and they'll find the accepted answer and it'll be like, okay, this is it. This is the solution. Whereas, at least in this case, it seems like you can go like, okay, well here's, here's 10, 10 potential solutions, and at least you get a little bit more exposure to, um, what are the different ways you could do it. [00:34:06] Leo: Exactly, and, and it's nice for 'em to see these different options. And I think there is, for professional software engineers seeing that stack overflow post, like, here's the accepted answer, integrating that into your code isn't a big jump for, for a lot of us. Um, but I do wanna stress that for the intro students, it often is a really big jump. Uh, just the, oh, how do I change around this? Oh, this was the interface for this function, but I'm been asked to have this other interface with a function and, and they really can struggle in that domain. And so I think copilot and these LLMs are nice in that they give back answers that are more tailored to the existing code that they're working with, um, and will reduce that barrier of them trying to incorporate the answer. Optimization can come later, most code is straightforward [00:34:50] Jeremy: So it seems kind of overall, when you're talking about people who are using programming in a more professional capacity, the code style and efficiency that will probably be taught very similarly to however it is now, where you basically have to get exposed to different styles and types of code, get exposed to the algorithms and and that will allow you to read the answers you get back better. So the answers you get back from the LLM with the knowledge you gain from these later courses, you'll be able to tell like, oh, okay, this is, this. Level of complexity, or this has like, you know, exponential, performance implications, that kind of thing. [00:35:43] Leo: So I think the performance piece is really important. Um, and I appreciate your, you bringing it up. I think, I'm, I'm kind of curious, uh, uh, what percentage of the time professional programmers are really spent, uh, are spending optimizing, uh, the code that they write? Um, I suspect a lot of the code that's written, uh, is pretty straightforward. Uh, you, you already know how to work with the database you're working with. You already know how to write the queries for that. You're, you're, you're just, uh, you're still doing something that, that's certainly thought provoking, but it's not the hard work of, oh, how am I gonna write design the right algorithm for this to get the exact best runtime? And so I think there are some times that that does matter, but those may be the times that the LLMs aren't as helpful and there's still gonna be a, a pretty big need for programmers who know how to do that, uh, themselves. [00:36:33] Jeremy: Yeah. I mean, I, I think that of course this is gonna vary from industry to industry, but Dan, you were talking about learning APIs and I feel like a lot of jobs are learning APIs and gluing them together. [00:36:49] Dan: Yeah. Um, I would agree, but I wonder what can happen if some of that's automated. Right? So maybe, people who are gluing APIs together will be able to. Get even more done, right? Incorporate even more, APIs in the same amount of time that they've been doing it. Now, I don't, I don't know if that job changes as dramatically as it, it seems, um, I guess there's this tension between people, having to change jobs or become more efficient in the current job. And, you know, obviously I, I hope it's the latter and there is some recent evidence that it could end up being, the latter, just more productive people overall, building, know, bigger software in incorporating more APIs than, than before and, and not overloading yourself. So, we'll, we'll see, you know, how it, how it all, um, how it all turns out. But I'm, I'm hopeful that we'll just be doing our jobs better. Reading code as a skill [00:37:51] Jeremy: In that, that context, sometimes people will say that the, the reading of code and comprehending code can sometimes be more difficult than writing the, the code. And in fact, can sometimes take you more time, like, let's say you've built out a project and now you need to add new features. Well, to add the feature, you have to understand the, the code base that existed before and so. When we talk about LLMs and the context of not programming, but just general writing, people talk about the fact that it's easy to generate more writing, right? We can generate more documents, blog posts, more articles, that sort of thing. And with code, it sounds like it'll be similar, right? Where it'll be easier for us to write more code, generate more code. Um, but I wonder if either of you have thought or, or think it's a concern that we'll be generating so much code that now we'll have so much we won't be able to even have the time to understand all of it, [00:38:55] Leo: I haven't thought much about the generating so much code that you can't understand. I mean, I think if, if we're generating code, I, I'm really hoping someone's testing and making sure it works right and stuff. And so I guess it depends on what kind of, uh, what level of the interface are we, we looking at. Um, but I have thought about a fair bit about the, the, what you described early on in your question, which was. Diving into a big code base, figuring out what needs to be changed and changing it, that is a really common task, especially for like new software engineers, uh, in their, their first jobs. Right. And it is also one that's really well documented in the, the education literature, uh, education literature, uh, that we aren't teaching them to do. Like we almost always are giving them, uh, right, these functions are really well defined or, uh, write the code all from yourself, but we rarely ever give them large code bases to learn from. Now I don't think diving into a large code base and trying to understand how it works is the right thing for like an intro class. And then we're mainly talking about, uh, students first learning your program here. Uh, but I am encouraged that we are teaching code reading as kind of a first level skill when I think current programming courses teach code reading right? In parallel with writing. So a lot of the writing's happening very early before they even know how to read well. Um, and so I think there's some optimism here that if we teach code reading first and make it a core skill, they'll be better set up in the later classes to maybe take on those large projects where they tackle the exact problem you're describing, which is also the exact thing they're gonna have to do when they get to, to their jobs. The amount of code we throw away may increase exponentionally [00:40:37] Jeremy: Yeah, it, it also kind of, I wonder sometimes when you're writing code, you'll write it in a certain way because it's tedious to write a lot of code, right? Like you'll, you'll make something generic in such a way where you can reuse it, and maybe reduce the amount of lines of code. But then when you have something, generate that code, maybe it'll be a solution that. Is a lot more code than you would've written personally, and it works. But, by nature, the fact that it was easy to generate, you chose that solution versus one that, that maybe was more generic and um, had less code. I, I'm not sure if that makes sense, but I'm kind of curious if the use of these models will sort of change maybe how we write code [00:41:30] Dan: I'm kind of wondering if the amount of code we throw away is going to increase exponentially. Because, because, um, you spend time working on something, you're probably gonna keep it. But I, I wonder because, uh, Jeremy, like what you said, it's, it's so easy to generate code now. so I, I've had this thought where, what, not sure how, how, um, how much I believe myself here, but, uh, should we be storing the, the prompt, like not the dot py file, right? Like just store the prompt and then if you do have to regenerate the code later, maybe you gotta make some tweaks or something. You just change the prompt and then, and then rerun it. So, because, because, because code is, um, It's not there yet, but it's, it's becoming free, right? It's becoming, you can generate as much of it as you want. And so I, I wonder how much, how much of it is, so there's, there's a lot of code already that you write once, and you run it once and then, and then you get rid of it or lose it or whatever. And I wonder if that, that practice will increase. So it's like, okay, you know, I wanna do this data analysis. Okay. So you write a prompt, you get some code, you generate some graph, and then you just don't even think about it. You just get rid of it, and then maybe later you want another similar analysis and you just do it again. Right. So I kind of wonder, because there's maybe less ownership now of code, right? You didn't like sweat as much to write the code. So maybe, maybe more of it gets thrown away. [00:43:03] Leo: I, I completely see what you're saying, Dan. So you have the prompt and you had it perform some form of data analysis and you wanna tweak it to do a slightly different data analysis. Uh, I wouldn't go into the, I mean, right now if I wrote the code from scratch, I would go into the code and find that one spot that I need to change and I would tweak it. But if I'm just generating the code, I would just tweak the prompt and then get a new piece of code that does exactly what I want there without having to, to [00:43:26] Dan: yeah. You know, how, how, it can take a, a long time to re-familiarize yourself with a program that you wrote six months ago. You know, it's like, oh, I, I called this variable temp one. Like, what's this for again? Right. you know, maybe, yeah, [00:43:41] Leo: Wait, I think we've all been there. Keeping the prompt instead of the code [00:43:43] Dan: Uh, but yeah, I don't know. It's just, just a thought I've been having. It's like, it, so, so when, when, now when, when I hear people talking about code maintenance, for example, like using, you know, good variable names and consistent style and stuff, in my head I'm thinking, well, you know, is, is the code the artifact now? Is it still the artifact? And right now, you know, of course it is. But, um, but, you know, fast forward a little while, maybe, maybe some of what I just said, uh, sort of becomes true eventually. [00:44:11] Leo: That's getting to perhaps kind a larger issue about what is the interface that we're, we work with as programmers. I've been thinking about this a lot, uh, just because I, I teach my, my background's. I have a PhD in computer architecture, and so I teach the classes that do machine code and assembly code, and they're, they're, they're core classes for computer scientists because you need to know how computers work. And, um, I think that's a core component, understanding that, But we don't start by teaching the students machine code. Like no one wants to learn how to program a machine. Um, at least I can't imagine anyone wanting to learn that. Um, and we've kind of cognitively picked Python or Java right now, the most common two programming language to learn from. Because they're easy to learn, they're easy to, to read. The code tends to be more understandable when you read it. It tends to be a little bit more forgiving when you write it. Um, and so we picked these because we think they're nice interfaces. They're, they're convenient for programmers and they're convenient for, for new learners. And it just seems to make sense that the LLM may be that next step of interface that we start choosing. The, the catch is because it can be wrong. It's not like a compiler. A compiler is deterministic. It's gonna be, uh, shy of that. Maybe one time in your career you find a compiler bug, like the compiler's always right. This time the LLM isn't always right and so I, I'm not sure how this is all gonna play out. Um, you can imagine the LLM as the new interface and all we ever store is, is code prompts and we don't ever even see the code, perhaps as one scenario. And the other is we, we do in fact still interact with the LLMs and still interact after the code. Um, but I think it's too early to kind of know where this is all gonna fall. But, um, we could see some big shifts, I think, in the field over the next few years. [00:45:52] Jeremy: Yeah, I think that's pretty interesting to think about what, what Dan had mentioned where yeah, you could check in your prompt and maybe a set of test cases for the app that's supposed to come out and yeah, maybe that's your alternative to the actual source code. Um, especially for things that, like you were saying, are, are used not that frequently or maybe you only use it once and so the, um, the quality of the actual code is. Maybe less so important in terms of readability and things like that. And as long as you can reliably reproduce that thing, yeah, maybe, maybe that does make sense. [00:46:39] Leo: The reliable reproduction could be the tricky part. And you there may be even saying that you, you start doing where you tag don't, don't try to reproduce this. Like, we actually spend a whole bunch of time on this. It's super optimized. Like, don't think the LLMs gonna give you this answer again. So, uh, keep the code along with the prompt. Keep the code too. Don't, don't scratch that because the LLMs not gonna do better. Um, and then in some cases you're like, yeah, the LLM's gonna do a pretty good job on this and [00:47:07] Dan: Yeah. Leo, maybe we have to Maybe we have to distinguish between code that you can just get out of an LLM no problem. And code that people have spent time working on. I like that. Yeah. Yeah, [00:47:21] Leo: some you're like, hashtag don't change. [00:47:23] Dan: Humans were here. [00:47:25] Leo: exactly. The concerns about relying on commercial tools [00:47:27] Jeremy: Yeah. this is the 30th iteration of this code we generated and we verified that this one's good. So just, just, it's a interesting, interesting future. We, we might be heading into, so, so one thing you, you mentioned a little bit earlier is that the tools that you're gonna recommend to your students, it sounds like it's primarily going to be GitHub copilot and GitHub copilot X for the, the chat interface. And one thing about these tools is these are tools by commercial companies, right? These are tools by OpenAI and Microsoft. They're tools that you have to pay a subscription fee to use. You have to send your code to a commercial server. And I wonder if that aspect concerns you at all. The, the fact that the foundations that our students are learning on is kind of reliant on these companies and these cloud services. [00:48:31] Leo: I think it's an amazing question. Uh, I think to some degree these are the tools that professional software engineers are using, and so we need, there's, there's a bit of an obligation as instructors to teach them the tools that they're gonna be using as professionals going forward. I think right now they're free. Uh, to use for, for education's sake. and so as long as that stays the case, I'm a little, more comfortable with it. If it started to move to a pay model for education, I think there could be some really big problems with equity. and I think it's not just true for, for computer science, but I'll start with computer science. I mean, if it's computer science and we start making it where you would have to pay to get access to these models or use these models, then whether we tell the students they can use it or not, they still can use them. And so there's gonna be some students that, the wealthier students who may have access to these, who are being able to learn better from these, being able to solve better homeworks with these, that's super scary. And you could imagine the same thing for even just K through 12 education, right? If you're thinking about them writing essays for homeworks or anything else, if it's a pay model, then the students who have, uh, the money will pay for it and get access to these tools. And the students who don't, won't. You could imagine the, all these kind of socioeconomic, uh, divides that already exist, only being exacerbated by these tools if they switch to this pay model. Um, so that has me very worried. Um, and there's some real ethical issues we have to think about when we're, we're using them. Yeah. Um, the other ethical issue I kinda wanna mention is just the, the copyright and the notion of ownership. Um, and I think it's important for us as instructors to engage students in the conversation about what it means to create content and intellectual property and how these models are built and what they're building off of. Um, and just engage in that ethical conversation with the students. I don't think we as a society have figured this out. I don't, I think there's gonna be some time both legally and ethically before we have the right answers. but at the very least, you need to talk to the students about, uh, these challenges so they know what's going on and they can engage in the debate. [00:50:45] Dan: Yeah, just to underscore that, Leo, this is the reason we're doing research on the first version of the course that Leo's teaching. We need research on the impact of LLMs, on students. especially, we need to know if students benefit from this, in what ways they benefit. How are these benefits distributed across demographic groups? We have a long and sad history in, computer science of inequities, in who takes our courses, who succeeds in our courses. we're very aware of this and it's, uh, unacceptable to make that situation, uh, worse than it already is. So, um, we're, we're gonna be carefully doing our research on this, uh, first offering of the course. A downside is students might bypass fundamentals [00:51:30] Jeremy: So we've mostly been talking about the benefits of using these tools in classes and in education. we just mentioned the possible inequities if you don't have access to those things, I, I wonder if from either of you, if there are negatives you see to this technology, whether that's the impact on what people learn or in anything else. Like are there downsides you see to the use of this technology? [00:52:04] Dan: Yeah. So in addition to, uh, the important, uh, inequity concerns that, uh, we just talked about, I have a concern about students using the tools in ways that. Don't help them learn the skills we think they need. So it's a, it's a, it's a power tool and you can, uh, you can get pretty far, I think with, without, um, being systematic in, in how you work with it and without testing, without debugging, um, it's, you know, it's, it's kind of magic right now. And so I can imagine, a lot of students just taking off at, you know, a hundred miles an hour. and so I'm one, one of, one of, uh, the things we have to worry about in these initial courses is, convincing students that there really are principles to using this technology. You can't just type something and get an answer and then go party. and, and, and so that, that is one of my concerns. That's one of the negatives. It's super powerful. And, like, like, so before you, you can't just type some Python and make it work and, but now you can sort of type in whatever you want and kind of get something back. and so part of our job as educators is to help students use these tools, in in a way that. Will ensure their long-term success with, with these tools, right? So, I, I'm not saying that they can't just do whatever they want and, and make some of their first assignments work. I, I think they could, I think they could be like un principled with the prompts and just throw it in there and get code and, you know, submit that, submit that code. But, uh, we're, we're going, you know, we're going for longer term, uh, effectiveness here, right? We have students who may not take another CS course. We need to keep them in mind. We have students who are gonna wanna eventually be software engineers, uh, security experts, PhDs in computer science, right? So we have a number of audiences that we're talking about, and we think they all need to know the fundamental skills of programming still. Even though, you know, they have this, this power tool at their expense now. [00:54:07] Leo: Speaking of the fundamental skills for programming, I, because of my, my hardware background, I'm this huge fan of teaching mental models in classes. Like what is the mental model of computation? Like, how, how do you imagine the computer is executing as you write the code? And, uh, ideally a professional computer scientist should be able to take, okay, well this is kind of the, my interpretation, this is my mental model for when I'm working at Python. If I really, really wanna drill this down, I can turn that into assembly. And if I really had to and turn to machine and even think about how this is working within the cash subsystems and virtual memory and all these things, I want 'em to be able to play those things out. We are changing the first class, and I think the first class is gonna be doing some things much better than before, like teaching problem decomposition and things like that. I'll, I'll mention that in a second, but, we are doing some things better. but we may not be teaching at how is the computer working as well. And so you can't just change one course and think the rest of the curriculum's gonna work. And so I think the entire curriculum's gonna need to adjust some, um, in, in a way of just adapting to these LLMs. Rethinking how to assess students [00:55:10] Leo: Um, the second piece for things getting potentially more challenging, uh, is instructors, we're in a good place right now as instructors, uh, in terms of how we assign and grade homework. Um, so grading, uh, this probably isn't gonna be a shock, is not one of our favorite things to do as faculty. I mean, it's actually really important. Uh, it's, it's central to us understanding how our students have learned, but it's generally not the most favorite thing that we do. And what a lot of instructors have done, myself included, is for much the introductory sequence. We have created assignments that can be entirely auto grade. So we define functions incredibly well. Like, write a really good description, this is exactly what it needs to do, and the students write that one piece of code and, uh, whether we like it or not. That is exactly when copilot does very well, and the LLMs do really well. And so the LLMs are gonna solve those very easily already. So we have to fix our assignments just like it, it's a given. Um, but it means that we're probably gonna have to rethink how we do assessment. Um, and so we're probably gonna be writing assignments that are much more open-ended and we're probably gonna have to be grading those, uh, putting more care and time integrating those potentially by hand. Uh, but I think these are all good things for the community and for the field. Um, but you can imagine how it's gonna be a bit of a, a shift for faculty and, and may take some time, uh, to be adopted as a result. [00:56:41] Jeremy: And, and so if you're shifting to homework that is more broad in scope, has more code, needs more human eyes on it, how how does that scale within the educators side? Right. You were, you were talking about how you've got, um, things that could be auto graded before and then now you're letting somebody generate this whole project. How does that work from your end? [00:57:09] Leo: I, I think there's a few things that are at play. Um, we, at, at large institutions like Dan and I are at, we have kind of armies of, uh, instructor assistants, instructional assistants that help us, uh, and so we can engage 'em in, in various tasks. And so, uh, one of the roles they heavily have now is helping students in the labs solve these auto grade assignments. and so you can imagine they will still be in the labs helping the students with these creative assignments, but now they're gonna have to have potentially a larger role in assessing the success of those. Um, there's been some really creative work, uh, in, in assessment and so I'll, I'll, I'll mention a couple of the ones, but there's, I, I'm sure I'm gonna be omitting some. But, uh, one is, Students could complete their project, and then they have to record a short video of them explaining the code that was in their project and how it worked, and you actually assess them on that video and their explanation of the code and how it works. Right? Because those can be perhaps shorter than trying to go through a really big project and, and see how it works. Um, there's a tool out of a UIUC, um, called Prairie Learn that helps with, um, uh, these are still auto graded, but uh, it helps with the, the test setting where you can write questions and have them, uh, graded kinda in a, in a exam or homework setting. the, the neat feature of that is that it can be randomized and so you don't have to worry as much about students kind of leaking information to each other about, test content from quarter to quarter. And so, because the randomization, they have to learn, actually learn the skills, and so you can, um, kinda engage with 'em in these test centers. And so right now a big grading burden on, on faculty is exams. And so you can actually give more exams, give more frequent feedback to the students and with, without the same grading burden. and so that, that's the other kinda exciting assessment piece. [00:59:01] Dan: Current assessment is not effective [00:59:01] Jeremy: In the different types of assessments, like the example of the video you gave, I'm just thinking to myself, well, the person could ask copilot or ChatGPT to give 'em a script, right? And they can rehearse that when they, um, send you a video. [00:59:18] Leo: I think, but I think that's, um, I think this is a philosophical shift in assessment that's kind of been gaining momentum over the years and that's that the assignments are all formative and they should all be. Pretty low stakes and the students should be doing them for the process of learning. and then, and, and it's unfair in some ways. There's a, there's a lot of things right now where you kind of grade them on, were you present at this time? Did you, did you meet this deadline at this time? Which if you're thinking about the, a diverse population of students, like you can imagine like a, a working mother who's also trying to do this, grading them on where you here at this time doesn't feel very equitable to me. And so there's this whole movement for grading for equity that shifts much of the assessment onto the exams. And so, yeah, the students could, uh, find multiple ways to cheat on the homeworks, but that's not the point of the homework and the homework's just to learn. It's a small scale, the grade, so. But you still then have those kinda controlled environments where they're taking these tests and that's where the grade actually comes from. Um, it's gonna take some time to make that shift, at, at, at least at a number of schools, my own included assess that those ho take home assignments are a huge portion of the grade. And students will love that because they can get all this help. And they can, especially with the auto graders, that they don't even write their own test cases. They just use the auto graders, the test cases. Right. Um, which is really depressing. Um, and they go to the, the, the instructional staff. The instructional staff tends to, to give away the answers. That's actually a paper that we, uh, published a few years ago. Um, and so the students love this high stakes, but tons of help version of assessment, but that may not actually measure their, their level of knowledge. And so it's gonna take a little bit of adjustment, for students and for faculty to do the shift, uh, to where the, as assess the, the exams are the Give students something interesting to build and don't worry about cheating [01:01:09] Dan: Yeah. Also, I'm, I'm not convinced that cheating is gonna be a problem here. it's very possible, for example, that students cheat on our previous assignments because the assignments were not authentic. Um, you know, in industry you're never going to, no one's gonna come up to you and say, Hey, like, from scratch, you know, write this exact function, takes two lists and determines, you know, how many values are equal between them. It's like, it's like, that's not gonna happen, right? You're gonna be doing something that has some sort of business purpose. And I kind of wonder, um, and this, this will, you know, this will play out, um, one way or another in the next, in the next, uh, few months. But I kind of wonder if we give students authentic tasks. Now you're cheating yourself right out of doing some, some something of value, right? Like before you were. You were probably cheating yourself out of a learning opportunity, but how, how can, you know, how can students know that? Right. The assignments boring, right? It's like, write all these functions and then something, something happens because of the magic, you know, starter glue code we wrote. So I
Sentry is an application monitoring tool that surfaces errors and performance problems. It minimizes the need to manually look at logs or dashboards by identifying common problems across applications and frameworks. David Cramer is the co-founder and CTO of Sentry. This episode originally aired on Software Engineering Radio. Topics covered: What's Sentry? Treating performance problems as errors Why you might no need logs Identifying common problems in applications and frameworks Issues with Open Telemetry data Why front-end applications are difficult to instrument The evolution of Sentry's architecture Switching from a permissive license to the Business Source License Related Links Sentry David's Blog Sentry 9.1 and Upcoming Changes Re-Licensing Sentry Transcript You can help edit this transcript on GitHub. [00:00:00] Jeremy: Today I'm talking to David Kramer. He's the founder and CTO of Sentry. David, welcome to Software Engineering Radio. [00:00:08] David: Thanks for having me. Excited for today's conversation. What's Sentry? [00:00:11] Jeremy: I think the first thing we could start with is defining what Sentry is. I know some people refer to it as an error tracker. Some people have referred to it as, an application performance monitoring tool. I wonder if you could kind of describe in, in your words what it is. [00:00:30] David: You know, as somebody who doesn't work in marketing, I just tell it how it is. So Sentry started out doing error monitoring, which. You know, dependent on who you talk to, you might just think of as logging, right? Like that's the honest truth. It is just logging just a different shape or form. these days it's hard to not classify us as just an APM tool that's like the industry that exists. It's like the tools people understand. So I would just say it's an APM tool, right? We do a bunch of things within that space, and maybe it's not, you know, item for item the same as say a product like New Relic. but a lot of the overlap's there, so it's like errors performance, which is like latency and sort of throughput. And then we have some stuff that just goes a little bit deeper within that. The, the one thing i would say that is different for us versus a lot of these tools is we actually only do application monitoring. So we don't do any since like systems or infrastructure monitoring. Meaning Sentry is not gonna tell you when you need to replace a hard drive or even that you need new hard, like more disk space or something like that because it's just, it's a domain that we don't think is relevant for sort of our customers and product. Application Performance Monitoring is about finding crashes and performance problems that users would associate with bugs [00:01:31] Jeremy: For people who aren't familiar with the term application performance monitoring, what is that compared to just error tracking? [00:01:41] David: The way I always reason about it, this is what I tell new hires and what I would tell, like my mother, if I had to explain what I do, is like, you load Uber and it crashes. We all know that's bad, right? That's error monitoring. We capture the crash report, we send it to developers. You load Uber and it's a 30 second spinner, like a loading indicator as a customer. Same outcome for me. I assume the app is broken, right? So we also know that's bad. Um, but that's different than a crash. Okay. Sentry captures that same thing and send it to developers. lastly the third example we use, which is a little bit more. I think, untraditional, but a non-traditional rather, uh, you load the Uber app and it's like a blank screen or there's no button to submit, like log in or something like this. So it's kind of like a, it's broken, but it maybe isn't erroring and it's not like a slow thing. Right. Same outcome. It's probably a bug of some sorts. Like it's what an end user would describe it as a bug. So for me, APM just translates to there are bugs, user perceived bugs in your application and we're able to monitor and, and help the software teams sort of prioritize and resolve those, those concerns. [00:02:42] Jeremy: Earlier you were talking about actual crashes, and then your second case is, may be more of if the app is running slowly, then that's not necessarily a crash, but it's still something that an APM would monitor. [00:02:57] David: Yeah. Yeah. And I, I think to be fair, APM, historically, it's not a very meaningful term. Like I as a, when I was more of just an individual contributor, I would associate APM to, like, there's a dashboard that will tell me what's slow in my application, which it does. And that is kind of core to APM, but it would also, none of the traditional tools, pre sentry would actually tell you why it's broken, like when there's an error, a crash. It was like most of those tools were kind of useless. And I don't know, I do actually know, but I'm gonna pretend I don't know about most people and just say for myself. But most of the time my problems are errors. They are not like it's fast or slow, you know? and so we just think of it as like it's a holistic thing to say, when I've changed the application and something's broken, or it's a bug, you know, what is that bug? How do we help people fix it? And that comes from a lot of different, like data signals and things like that. the end result is still the same. You either are gonna fix it or it's not important and you ignore it. I don't know. So it's a pretty straightforward, premise for us. But again, most companies in the space, like the traditional company is when you grow a big company, what happens is like you build one thing and then you build lots of check boxes to sell more things. And so I think a lot of the APM vendors, like they've created a lot of different products. Like RUM is a good example of another acronym that lives with an APM. And I would tell you RUM is completely meaningless. It, it stands for real user monitoring. And so I'm like, well, what's not real about monitoring the application? Well, nothing's not real, but like they created a new category because that's how marketing engines work. And that new category is more like analytics than it is like application telemetry. And it's only because they couldn't collect the app, the application telemetry at the time. And so there's just a lot of fluff, i would say. But at the end of the day too, like developers or engineering teams, it's like new version of the application. You broke something, let's tell you about it so you can fix it. You might not need logging or performance monitoring [00:04:40] Jeremy: And, and so earlier you were saying how this is a kind of logging, but there's also other companies, other products that are considered like logging infrastructure. Like I, I would think of companies like Paper Trail or Log Tail. So what space does Sentry fill that's that's different than that kind of logging? [00:05:03] David: Um, so the way I always think about it, and this is both personally true, and what I advise other folks is when you're building something new, when you start from zero, right, you can often take Sentry put it in, and that's good enough. You don't even need performance monitoring. You just need like errors, right? Like you're just causing bugs all the time. And you could do that with logging, but like the delta between air monitoring and logging is night and day. From a user experience, like error monitoring for us, or what we built at the very least, aggregates the errors. It, it helps you understand the frequency. It helps you when they're new versus old. it really gives you a lot of detail where logs don't, and so you don't need logging often. And I will tell you today at Sentry. Engineers do not use logs for the most part. Uh, I had a debate with one of our, our team members about it, like, why does he use logs recently? But you should not need them because logs serve a different purpose. Like if you have traces which tell you like, like fast and slow in a bunch of other network data and you have this sort of crash report collection or error monitoring thing, logs become like a compliance or an audit trail or like a security forensics, tool, and there's just not a lot of value that you would get out of them otherwise, like once in a while, maybe there's like some weird obscure use case, but generally speaking, you can just pretend that you don't need logs most days. Um, and to me that's like an evolution of the industry. And so when, when Sentry is getting started, most people were still logs. And if you go talk to SRE teams, they're like, oh, login is what we know. Some of that's changed a little bit, but. But at the end of the day, they should only be needed for more complicated audit trails because they're just not a good solution to the problem. It's just free form data. Structured or not, doesn't really matter. It's not aggregated. It's not something that you can really use. And it's why whenever you see logging tools, um, not even the papertrails of the world, but the bigger ones like Splunk or Cabana, it's like this weird, what we describe as choose your own adventure. Like go have fun, build your dashboards and try to make the logs useful kind of story. Whereas like something like Sentry, it's just like, why would you waste any time trying to build dashboards when we can just tell you when something new is broken? Like that's the ideal situation. [00:06:59] Jeremy: So it sounds like maybe the distinction is with a more general logging tool, like you mentioned Splunk and Kibana it's a collection of all this information. of things happening, even though nothing's necessarily wrong, whereas Sentry is more Sentry is it's going to log things, but it's only going to log things if Sentry believes something is wrong, either because of a crash or because of some kind of performance issue. People don't want to dig through logs or dashboards, they want to be told when something is wrong and whyMost software is built the same way, so we know common problems [00:07:28] David: Yeah. I, i would say it's about like actionability, right? Like, like nobody wants to spend their time digging through logs, digging through dashboards. Metrics are another good example of this. Like just charts with metrics on them. Yeah. They tell me something's happening. If there's lots of log statements, they tell me something's going on, but they're not, they're not optimized to like, help me solve a problem, right? And so our philosophy was always like, we haven't necessarily nailed this in all cases for what it's worth, but. It was like, the goal is we identify an actual problem, like close to like a root cause kind of problem, and we escalate that up and that's it. Uh, versus asking somebody to like go have to like build these dashboards, build these things, figure out what data matters and all this because most software looks exactly the same. Like if you have a web service, it doesn't matter what language it's written in, it doesn't matter how different you think your architecture is from somebody else's, they're all the same. It's like you've got a request, you've got a database, you've got some cache, you've got all these like known, known quantity things, and the slowness comes from the same places. Errors are structured while logs are not [00:08:25] David: The errors come from the same places. They're all exhibiting the same kinds of behavior. So logging is very unstructured. And what I mean by that is like there's no schema. Like you can hypothetically like make it JSON and everybody does that, but it's still unstructured. Whereas like errors, it's, it's a tight schema. It's like there's a type of error, there's a message for the error, there's a stack trace, there's all these things that you know. Right. And as soon as you know and you define those things, you can just build better products. And so distributed tracing is similar. Hypothetically, it's a little bit abstract to be fair, but hypothetically, distributed tracing is creating a schema out of basically network annotations. And somebody will yell at me for just simplifying it to that. I would tell 'em that's what it is. But, same goal in mind. If you know what the data is, you can take action on it. It's not quite entirely true. Um, because tracing is much more freeform. For example, it doesn't say if you have a SQL statement, it should be like this, it should be formatted this way, things like that. whereas like stack traces, there's a file name, there's there's a line number, there's like all these things, right? And so that's how I think about the delta between what is useful information and what isn't, I guess. And what allows you to actually build things like Sentry versus just build abstract exploration. Inferring problems rather than having user identify them [00:09:36] Jeremy: Kind of paint the picture of how someone would get started with a tool like Sentry. Do they need to tell Sentry anything about their application? Do they need to modify their source code at all? give us a picture of how that works. [00:09:50] David: Yeah, like one of our fundamentals, which I think applies for any real business these days is you've gotta like reduce user friction, right? Like you've gotta make it dead simple to use. Uh, and for us there were, there was like kind of a fundamental driving constraint behind that. So in many situations, um, APM vendors especially will require you to run an agent a basically like some kind of process that runs on your servers somewhere. Well, if you look at modern tech stacks, that doesn't really work because I don't run the servers half my stuff's in the browser, or it's a mobile app or a desktop app, and. Even if I do have those servers, it's like an entirely different team that controls them. So deploying like a sidecar, an agent is actually like much more complicated. And so we, we looked at that and also because like, it's much easier to have control if you just ship within the application. We're like, okay, let's build like an SDK and dependency that just injects into the, the application that runs, set an API key and then you're done. And so what that translates for Sentry is we spend a lot of time knowing what Django is or what Rails is or what expresses like all these frameworks. And just knowing how to plug into the right signals in those frameworks. And then at that point, like the user doesn't have to do anything. And so like the ideal outcome for Sentry is like you install the dependency in whatever language makes sense, right? You somehow configure the API key and maybe there's a couple other minor settings you add and that gives you the bare bones and that's it. Like it should just work from there. Now there's a lot you can do on top of that to enrich data and whatnot, but for the most part, especially for errors, like that's good enough. And that, that's always been a fundamental goal of ours. And I, I think we actually do it phenomenally well. [00:11:23] Jeremy: So it sounds like it infers things about the application without manual configuration. Can you give some examples of the kind of things that Sentry knows without the user having to tell it? [00:11:38] David: Yeah. So a good example. So on the errors side, we know literally everything because an error object in each language has all these attributes with it. It, it gives you the stack trace, it gives you a lot of these things. So that one's straightforward. On the performance side, we use a combination of leveraging some like open source, I guess implementations, like open telemetry where it's got all this instrumentation already and we can just soak that in, um, as well as we automatically instrument a bunch of stuff. So for example, say you've got like a Python application and you're using, let's say like SQL Alchemy or something. I don't actually know if this is how our SDK works right now, but, we will build something that's aware of that library and make sure it can automatically instrument the things it needs to get the right information out of it. And be fair. That's always been true for like APM vendors and stuff like that. The delta is, we've often gone a lot deeper. And so for Python for example, you plug it into an application, we'll capture things like the error, error object, which is like exception class name exception value, right? Stack trace, file, name, line number, all those normal things, function name. We'll also collect source code. So we'll, we'll give you sort of surrounding source code blocks for each line in the stack trace, which makes it infinitely easier to consume. And then in Python and, and php, and I forget if we do this anywhere else right now, we'll actually even allow you to collect what are called stack locals. So it'll, it'll give you basically the variables that are defined almost like a debugger. And that is actually, actually like game changing from a development point of view. Because if I can go look in production when there's an incident or a bug and I can actually see the state of the application. , I, I never need to know like, oh, what was going on here? Oh, what if like, do I need to go reproduce this somehow? I always have the right information. And so all of that for us is automatic and we only succeed like, it, it's, it's like by definition inside of Sentry, it has to be automatic. Like if we ask the user to do anything whatsoever, we're failing. And so whenever we design any product or anything, and to be fair, this is how every product company should operate. it's gotta be with as little user input as humanly possible. And so you can't always pull that off. Sometimes you have to have users configure stuff, but the goal should always be no input. Detecting errors through unhandled exceptions [00:13:42] Jeremy: So you, you're talking about getting a stack trace, getting, the state of variables, source code. That sounds like that's primarily gonna be through unhandled exceptions. Would you say that's, that's the primary way that you get error? [00:13:58] David: Yeah, you can integrate in other ways. So you can like trigger our API to capture an, uh, an exception. You can also, for better or worse, it's not always good. You can integrate through logging adapters. So if you're already using a logging framework and you log their errors there, we can often capture those. However, I will say in most cases, people use the logging APIs wrong and the data becomes junk. A good, a good example of this is like, uh, it varies per language. So I'm just gonna go to Python because Python is like sort of core to Sentry. Um, in Python you have the ability to log messages, you can log them as errors, you can log like actual error objects as errors. But what usually happens is somebody does a try-catch. They, they capture the error they rescue from it. They create a logging call, like log dot error or something, put the, the error message or value in there. And then they send that upstream. And what happens is the stack trace is gone because we don't know that it's an error object. And so for example, in Python, there's actually an an A flag. You pass the logging call to make sure that stack trace stays present. But if you don't know that the data becomes junk all of a sudden, and if we don't have a stack trace, we can't actually aggregate data because like there's just not enough information to like, to run hashing on it. And so, so there are a lot of ways, I guess, to capture the information, but there are like good ways and there are bad ways and I think it, it's in everybody's benefit when they design their, their apt to like build some of these abstractions. And so like as an example, when, whenever I would start a new project these days, I will add some kind of helper function for me to like log an exception when I like, try catch and then I can just plug in whatever I need later if I want to enrich the data or if I wanna send that to Sentry manually or send it to logs manually. And it just makes life a lot easier versus having to go back and like augment every single call in the code base. [00:15:37] Jeremy: So it, it sounds like. When you're using a tool like Sentry, there's gonna be the, the unhandled exceptions, which are ones that you weren't expecting. So those should I guess happen without you catching them. And then the ones that you perhaps do anticipate, but you still consider to be a problem, you would catch that and then you would add some kind of logging statement to your code that talks to Sentry directly. Finding issues like performance problems (N+1 queries) that are not explicit errorsz [00:16:05] David: Potentially. Yeah. It becomes a, a personal choice to be fair at that, at that point. but yeah, the, the way, one of the ways we've been thinking about this lately, because we've been changing our error monitoring product to not just be about errors, so we call it issues, and that's in the guise of like, it's like an issue tracker, a bug tracker. And so we started, we started putting what are effectively like, almost like static analysis concerns inside of this issue tracker. So for example, In our performance monitor, we'll do something called like detect n plus one queries, which is where you execute a, a duplicate query in a loop. It's not necessarily an error. It might not be causing a problem, but it could be causing a problem in the future. But it's like, you know, the, the, the qualities of it are not the same as an error. Like it's not necessarily causing the user to experience a bug. And so we've started thinking more about this, and, and this is the same as like logging errors that you handle. It's like, well, they're not really, they're not really bugs. It's like expected behavior, but maybe you still want to keep it like tracking somewhere. And I think about like, you know, Lins and things like that, where it's like, well, I've got some things that I definitely should be fixing. Then I've got a bunch of other stuff that's like informing me that maybe I should take action on or not. But only I, the human can really know at the end of the day, right, if I, if I should prioritize that or not. And so that's how I kind of think about like, if I'm gonna try catch and then log. Yeah, you should probably collect that data. It's probably less important than like the, these other concerns, like, like an actual unhandled exception. But you do, you do want to know that they're happening and whatnot. And so, I dunno, Sentry has not had a strong opinion on this historically. We're just like, send us whatever you want to capture in this regard, and you can pay for it, that's fine. It's like usage based, you know? we're starting to think a lot more about what should that look like if we, if we go back to like, what's the, what's the opinion we have for how you should use the product or how you should solve these kinds of software problems. [00:17:46] Jeremy: So you gave the example of detecting n plus one queries is, is that like being aware of the framework or the ORM the person is using and that's how you're determining this? Or is it at more of a lower level than that? [00:18:03] David: it is, yeah. It's at the framework level. So this is actually where Open Telemetry causes a lot of harm, uh, for us because we need to know what a database query is. Uh, we need to know like the structure of the query because we actually wanna parse it out in a lot of cases. Cause we actually need to identify if it's duplicate, right? And we need to know that it's a database query, not a random annotation that you've added. Um, and so what we do is within these traces, which is like if you, if you don't know what a trace is, it's basically just like, it's a tree, like a tree structure. So it's like A calls B, calls C, B also calls D and E and et cetera, right? And so you just, you know, it's a trace. Um, and so we actually just look at that trace data. We try to find these patterns, which is like, okay, B was a, a SQL query or something. And every single sibling of B is that same SQL query, but sort of removing certain parameters and stuff for the value. So we'll look at that data and we'll try to pull out anomalies. So m plus one is an example of like a fairly obvious anti pattern that everybody knows is bad and can be optimized. Uh, but there's a lot of other that are a little bit more subjective. I'll give you an example. If you execute three SQL statements back to back, one could argue that you could just batch those SQL statements together. I would argue most of the time it doesn't matter and I don't need to do that. And also it's not guaranteed that that is better. So it becomes much more like, well, in my particular situation this is valuable, but in this other situation it might not be. And that's where I go back to like, it's almost like a linter, you know? But we're trying to infer all of that from the data stream. So, so Sentry's kind of, we're kind of a backwards product company. So we build our product from a technology vision, not from customers want this, or we have this great product vision or anything like that. And so in our case, the technology vision is like, there's a lot of application data that comes in, a lot of telemetry, right? Errors, traces. We have a bunch of other streams now. within that telemetry there is like signal. And so one, it's all structured data so we know what it is so we can actually interpret it. And then we can identify that signal that might be a problem. And that signal in our case is often going to translate to like this issue concept. And then the goal is like, well, can we identify these problems for people and surface them versus the choose your own adventure model, which is like, we'll just capture everything and feed it to the user and they can figure out what matters. Because again, a web service is a web service. A database is a database. They're all the same problems for everybody. All you know, it's just, and so that's kind of the model we've built and are continuing to evolve on and, and so far works pretty well to, to curate a lot of these workflows. Want to infer everything, but there are challenges [00:20:26] Jeremy: You talked a little bit about how people will sometimes use tracing. And in cases like that, they may need some kind of session ID to track. Somebody making a call to a service and that talks to a database and that talks to other services. And you, inside of your application, you have to instrument some way of tracking. This all came from this one request. Is that something that Sentry can infer or is there something that the developer has to put into play so that you can track that sort of thing? [00:21:01] David: Yeah, so it's, it's like a bit of both. And i would say our goal is that we can infer everything. The reality is there is so much complexity and there's too much of a, like, too many technologies in the world. Like I was complaining about this the other day, like, the classic example on web service is if we have a middleware hook, We kind of know request response, usually that's how middleware would work, right? And so we can infer a lot from there. Like basically we can infer the boundaries, which is a really big deal. Okay. That's one thing is boundaries is a problem. What we, we describe that as a transaction. So like when the request starts. When the request ends, right? That's a very important boundary for everybody to understand because when I'm working on the api, I care about the API boundary. I actually don't care about what the database is doing at its low level or what the JavaScript application might be doing above it. I want my boundary. So that's one that we kind of can do. But it's hard in a lot of situations because of the way frameworks and technology has been designed, but at least traditional stuff like a, a traditional web stack, it works like a Rails app or a DDjango app or PHP app kind of thing, right? And then within that it becomes, well, how do you actually build a trace versus just have a bunch of arbitrary labels? And so we have a bunch of complicated tech within each language that tries to establish that tree. and then we annotate a lot of things along the way. And so we will either leverage Open Telemetry, which is an open format spec that ideally has very high quality data. Ideally, not realistically, but ideally it has high quality data. Every library author implements it great, everybody's happy. We don't have to do anything ever again. The reality is that data is like all over the map because there's not like strict requirements for what, how the data should be labeled and stuff. And not everything even has that data. Like not everything's instrumented with open telemetry. So we also have a bunch of stuff that, unrelated to using that we'll say, okay, we know what this library is, we're gonna try to infer some characteristics from this library, or we know what maybe like the DDjango template engine is. So we're gonna try to infer like when the template renders so you can capture that block of information. it is a very imperfect science and I would tell you like it's not, even though like Open Telemetry is a very fun topic for people. It is not necessarily good, like it's not in a good state. Could will it ever be good? I don't know in all honesty, but like the data quality is like all over the map and so that's honestly one of our biggest challenges to making this experience that, you know, tells you what's going on in your database so it tells you what's going on with the cash or things like this is like, I dunno, the cash might be called something completely random in one implementation and something totally different in another. And so it's a lot of like, like data normalization that you have to deal with. But for the most part, those libraries of things you don't control can and will be instrumented. Now the other interesting thing, which we'll see how this works out, so, so one thing Sentry tries to do there, we have all these layers of telemetry, so we have errors and traces, right? Those are pretty high level concepts. We also have profiling data, which is very, very, very, very low level. So it's usually only if you have like disc. I like. It's where is all the CPU time being spent in my application? Mostly not waiting. Like waiting's usually like a network call, right? But it's like, okay, I have a loop that's doing a lot of math, or I'm writing a bunch of stuff to disc and that's really slow. Like often those are not instrumented or it's like these black box areas of a performance. And so what we're trying to do with profiling data, instead of just showing you flame charts and stuff, is actually say, could we fill in these gaps in these traces? Like basically like, Hey, I've got a long period of time where the app's doing something. You know, here's an API call, here's the database stuff. But then there's this block, okay, what's that function or something? Can we pull that out of the profiling data? And so in that case, again, that's just automatic because the profile actually knows everything about the application and know it. It has full access to the function and the stack and everything, right? And so the dream is that you would just always have everything filled in the, the customer never has to do anything with one minor asterisk. And the asterisk is what I would call like business context. So a good example would be, You might wanna associate requests with a specific customer or something like that. Like you might wanna say, well it's uh, I don't know, Goldman Sachs or one of these big companies or something. So you can know like, well when Goldman Sachs is having performance issues or whatever it is, oh maybe I should focus on them cuz maybe they pay you a lot of money or something. Right. Sentry would never know that at the end of the day. So we also have these like kind of tagging contextual APIs that will say like, tell us some informations, maybe it's like customer, maybe it's something else that's relevant to your application. And we'll keep that data associated with the telemetry that's like present, you know, um, but the, at least the telemetry, like again, application's just worth the same, should be, there should be a day in the next few years that it's just all automatic. and again, the only challenge today is like, can it be high quality and automatic? And so that, that's like to be determined. [00:25:50] Jeremy: What you're kind of saying is the ideal is being able to look at this profiling information and be able to build a full picture of. a, a call from beginning to end, all the different things to talk to, but I guess what's the, what's the reality today? Like, what, what is Sentry able to determine, in the world we live in right now? [00:26:11] David: So we've done a lot of this like performance detection stuff already. So we actually can do a lot now. We put a lot of time into it and I, I will tell you, if you look at other tools trying to do tracing, their approach is much more abstract. It's like your traditional monitoring tool that's like, we're just gonna collect a lot of signals and maybe we'll find magic anomaly detection or something going on in it, which, you know, props, but that can figure that out. But, a lot of what we've done is like, okay, we kind of know what this data looks like. Let's go after this very like known quantity problem. Let's normalize the data. And let's make it happen like that's today. Um, the enrichment of profiles is new for us, but it, we actually can already do it. It's not perfect. Detection of blocking the UI thread in mobile apps [00:26:49] David: Um, and I think we're launching something in April or May, something around the, that timeframe where hopefully for the, the technologies we can instrument, we're actually able to surface that in a useful way. but as an example that, that concept that I was talking about, like with n plus one queries, the team built something using profiling data. and I think this, this might be for like a mobile app more so than anything where mobile apps have this problem of, it's, you've got a main thread and if you block that main thread, the app is basically frozen. You see this on desktop apps all the time. You, you very rarely see it on web apps anymore. But, but it's a really big problem when you have a web, uh, a mobile or desktop app because you don't want that like thing to be non-responsive. Right? And so one of the things they did was detect when you're doing like file io on the main thread, you know, right. When you're writing a disc, which is probably a slow thing or something like that, that's gonna block the whole thing. Because you should just do it on a separate thread. It's like an easy fix, potentially may not be a problem, but it could become a problem. Same thing as n plus one. But what's really interesting about it is what the team did is like they used the profiling data to detect it because we already know threads and everything in there, and then they actually recreated a stack trace out of that profiling data when it's surfaced. So it's actually like useful data with that. You could like that I or you as a developer might know how to take and actually be like, oh, this is where it happens at the source code. I can actually figure it out and go fix it myself. And to me, like as like I, I'm still very much in the weeds with software that is like one of the biggest gaps to most things. Is it just, it doesn't make it easy to consume or like take action on, right? Like if I've got a, a chart that says my error rate is high, what am I gonna do with that? I'm like, okay, what's breaking? That's immediately my next question. Right? Okay. This is the error. Where is that error happening at? Again, my next question, it, it's literally just root cause analysis, right? Um, and so that, that to me is very exciting. and I, I don't know that we're the first people to do that, I'm not sure. But like, if we can make that kind of data, that level of actionable and consumable, that's like a big deal for me because I will tell you is like I have 20 years of software experience. I still hate flame charts and like I struggle to use them. Like they're not a friendly visualization. They're almost like a, a hypothetically necessary evil. But I also think one where nobody said like, do we even need to use that? Do we need that to be like the way we operate? and so anyways, like I guess that's my long-winded way of saying like, I'm very excited for how we can leverage that data and change how it's used. [00:29:10] Jeremy: Yeah. So it sounds like in this example, both in the mobile app blocking the UI or the n plus one query is the Sentry, suppose, SDK or instrumentation that's hooked inside of your application. There are certain behaviors that it knows are, are not like ideal I guess, just based on. people's prior experience, like your own developers know that, hey, if you block the UI thread in this mobile application, then you're gonna have performance problems. And so that way, rather than just telling you, Hey, your app is slow, it can tell you your app is slow and it's because you're blocking the UI thread. Don't just aggregate metrics, the error tracker should have an opinion on what actual problems are [00:29:55] David: Exactly, and I, and I actually think, I don't know why so many people don't recognize this gap, because at the end of the day, like, I don't know, I don't need more people to tell me response times are bad or anything. I need you to have an opinion about what's good because. The only way it's like math education, right? Like, yeah, you learn the basics, but you're not expected to say, go to calc, but, and then like, do all the fundamentals. You're like, don't get a calculator and start simplifying the problem. Like, yeah, we're gonna teach you a few of these things so you understand it. We're gonna teach you how to use a calculator and then just use the calculator and then make it easier for everybody else. But we're also not teaching you how to build a calculator because who cares? Like, that's not the purpose of it. And so for me, this is like, we should be helping people sort of get to the finish line instead of making them run the entirety of the race over and over if they don't need to. I don't, I don't know if that's a good analogy, but that has been the biggest gap, I think, in so much of this software throughout the industry. And it's, it's, it's common everywhere. And there's no reason for that gap to exist these days. Like the technology's fine. And the technology's been fine for like 10 years. Like Sentry started in oh eight at this point. And I think there was only one other company I recall at the time that was doing anything that was even similar to like air monitoring and Sentry when we built it, we're just like, what if we just go deeper? What if we collect all this information that will help you debug the problem instead of just stopping it like a log aggregator or something kind of thing, so we can actually have an opinion about it. And I, I genuinely, it baffles me that more people do not think this way because it was not a hard problem at the time. It's certainly not hard these days, but there's still very, I mean, a lot more people do it now. They've seen Sentry successful and there's a lot of similar implementations, but it's, it's just amazes me. It's like, why don't you, why don't people try to make the data more actionable and more useful, the teams versus just collect more of it, you know? 40 people working on learning the common issues with languages and frameworks [00:31:41] Jeremy: it, it sounds like maybe the, the popularity of the stack the person is using or of the framework means that you're gonna have better insights, right? Like if somebody makes a, a Django application or a Rails application, there's all these lessons that your team has picked up in terms of, Hey, if you use the ORM this way, your application is gonna be slow. Whereas if somebody builds something totally homegrown, you won't know these patterns and you won't be able to like help as much basically. [00:32:18] David: Yeah. Yeah, that's exactly, and, and you might think that that is a challenge, but then you look at how many employees exist at like large tech companies and it's, it's not that big of a deal, like, , you might even think collecting all the information for each, like programming, runtime or framework is a challenge. We have like 40 people that work on that and it's totally fine. Like, and, and so I think actually all these scale just fine. Um, but you do have to understand like the domain, right? And so the counter version of this is if you look at say like browser applications, like very rich, uh, single page application type experiences. It's not really obvious like what the opinions are. Like, like if, if you, and this is like real, like if you go to Sentry, it's, it's kind of slow, like the app is kind of slow. Uh, we even make fun of ourselves for how slow it is cuz it's a lot of JavaScript and stuff. If you ask somebody internally, Hey, how would we make pick a page fast? They're gonna have no clue. Like, even if they have like infinite domain experience, they're gonna be like, I'm not entirely sure. Because there's a lot of like moving parts and it's not even clear what like, like good is right? Like we know n plus one is bad. So we can say not doing that is the better solution. And so if you have a JavaScript app, which is like where a lot of the slowness will come from is like the render times itself. Like how do you fix it? You, you can't actually build a product that tells you what to fix without knowing how to fix it, right? And so some of these newer and very fast moving targets are, are frankly very difficult for us. Um, and so that's one thing that I think is a challenge for the entire industry. And so, like, as an example, a lot of the browser folks have latched onto web vitals, which are just metrics that hopefully tell you something about the application, but they're not always actionable either. It'll be like, the idea with like web vitals is like, okay, time to interactive is an an important metric. It's like how long until the page loads that a user can do what they're probably there to do. Okay. Like abstractly, it makes sense to us, but like put into action. How do I optimize time to interactive? Don't block the page. That's one thing. I don't know. Defer assets, that's another thing. Okay. So you've gotta like, you've gotta build a technology that knows these assets could be deferred and aren't. Okay, which ones can be deferred? I don't know. Like, it, it, it's like such a deep rabbit hole. And then the problem is, six months from now, the tech will have completely changed, right? And it won't have like, necessarily solved some of these problems. It will just have changed and they're now a completely different shape of problem. But still the same fundamental like user experience is the same, you know? Um, and to me that's like the biggest challenge in the industry right now is that like dilemma of the browser at the end of the day. And so even from our end, we're like, okay, maybe we should step back, focus on servers again, focus on web services. Those are known quantities. We can do that really well. We can sort of change that to be better than it's been in the past and easier to consume with things like our n plus one detections. Um, and then take like a holistic, fresh look at browser and say, okay, now how would we solve this to make sure we can actually really latch onto the problems that like people have and, and we understand, right? And, you know, we'll see when we get there. I don't think any product does a great job these days for helping, uh, solve those problems. . But I think even without the, the products, like I said, like even our team would be like, fixing this is gonna take months because it's gonna take months just to figure out exactly where the, the common bottlenecks are and all these other things within an application. And so I, I guess what I mean to say with that is there's a lot of opportunity, I think with the moving landscape of technology, we can find a way to, whether it's standardized or Sentry, can find a way to make that data actionable want it something in between there. There are many ways to build things on the frontend with JavaScript which makes it harder to detect common problems compared to backend [00:35:52] Jeremy: So it sounds like what you're saying, With the, the back end, there's almost like a standard way of doing things or a way that a lot of people do it the same way. Whereas on the front end, even if you're looking at a React application, you could look at tenant react applications and they could all be doing state management a totally different way. They could be like the, the way that the application is structured could be totally different, and that makes it difficult for you to infer sort of these standard patterns on the front end side. [00:36:32] David: Yeah, that's definitely true. And it, it goes, it's even worse than that because well, one, there's just like the nature of JavaScript, which is asynchronous in the sense of like, it's a lot of callbacks and things like that. And so that already makes it hard to understand what's going on, uh, where things are happening. And then you have these abstractions like React, which are very good, but like they pull a lot of that away. And so, as an example of a common problem, you load the application, it has to do a lot of stuff to make the page render. You might call that hydration or whatever. Okay. And then there's a completely different state, which is going from, it's already hydrated. Page one, I, I've done an interaction or something. Or maybe I've navigated a page too, that's an entirely different, like, sort of performance problem. But that hydration time, that's like a known thing. That's kind of like time to interactive, right? But if the problem is in your framework, which a lot of it is like a lot of the problems today exist because of frameworks, not because of the technology's bad or the framework's bad, but just because it's abstracted and it's really hard to make it work in all these situations, it's complicated. And again, they have the same problem where it's like changing non sem. And so if the problem is the framework is somehow incorrectly re rendering the page as an example, and this came up recently, for some big technology stack, it's re rendering the page. That's a really bad problem for the, the customer because it's making the, it's probably actually causing a lot of CPU seconds. This is why like your Chrome browser tabs are using so much memory in cpu, right? How do you fix that? Can you even fix that? Do you just say, I don't know, blame the technology? Is that the solution? Maybe that is right, but how would we even blame the technology like that alone, just to identify why it's happening. and you need to know the why. Right? Like, that is such a hard problem these days. And, and personally, I think the only solution is if the industry sort of almost like standardizes on a way to like, on a belief of how this should be optimized and how it should be measured and monitored kind of thing. Because like how errors work is like a standardization effectively. It may not be like a formal like declaration of like, this is what an error is, but more or less they always have the same attributes because we've all kind of understood that. Like those are the valuable things, right? Okay. I've got a server rendered application that has client interaction, which is sort of the current generation of the technology. We need to standardize on what, like that web request, like response life cycle is, right? and what are the moving targets within there. And it just, to me, I, I honestly feel like a lot of what we use every day in technology is like beta. Right. And it's, I think it's one of the reasons why we're constantly always having to up, like upgrade and, and refactor and, and, and shift dependencies and things like that because it is not, it's very much a prototype, right? It's a moving target, which I personally do not think is great for the industry because like customers do not care. They do not care that you're using some technology that like needs a change every few months and things like that. now it has improved things to be fair. Like web applications are much more like interactive and responsive sometimes. Um, but it is a very hard problem I think for a lot of people in the world. [00:39:26] Jeremy: And, and when you refer to, to things feeling like beta, I suppose, are, are you referring to the frameworks people are using or the libraries they're using to support their front end development? I, I'm curious what you're, you're thinking there. [00:39:41] David: Um, I think it's everything. Even like the browser APIs are constantly shifting. It's, that's gotten a little bit better. But even the idea like type script and stuff, it's just like we're running like basically compilers to make all this code work. And, and so the, even that they're constantly adding features just because they can, which means behaviors are constantly changing. But like, if you look at a real world example, like React is like the, the most dominant technology. It's very well designed for managing the dom. It's basically just a rendering engine at the end of the day. It's like it's managed to process updates to the dom. Okay. Makes sense. But we've all learned that these massive single page applications where you build all your application logic and loaded into a bundle is a problem. Like, like, I don't know how big Sentry's bundle is, but it's multiple megs in size and it takes a little while for like a, even on fast fiber here in the Bay Area, it takes a, you know, several seconds for the UI to load. And that's not ideal. Like, it's like at some point half of us became okay with this. So we're like, okay, what we need to do is go back, literally just go back 10 years and we need to render it on the server. And then we need some stuff that makes interactions, you know, highly responsive in the UI or dynamic content in the ui, you know, bring, it's like bringing back jQuery or something. And so we're kind of going full circle, but that is actually like very complicated because the way people are trying to do is like, okay, we wanna, we wanna have the rendering engine operate the same on the server and is on as on the client, right? So it's like we just write one, path of code that basically it's like a template engine to some degree, right? And okay, that makes sense. Like we can all get behind that kind of model. But that is actually really hard to make work with a lot of people's software and, and I think the challenge and framers have adopted it, right? So they've taken this, so for example, it's like, uh, react server components, which is basically just like, can we render it on the server and then also keep that same interaction in the ui. But the problem is like frameworks take that, they abstract it and so it's another layer of complexity on something that is already enormously complex. And then they add their own flavor onto it, like their own opinions for maybe what the world way the world is going. And I will say like personally, I find those. Those flavors to be very hard to adapt to like things that are tried and true or importantly in this context, things that we know how to monitor and fix, right? And so I, I don't know what, what the be all end all is, but my thesis on this is you need to treat the UI like a template engine, and that's it. Remove all like complexity behind it. And so if you think about that, the term I've labeled it as, which I did not come up with, I saw this from somebody at some point, is like, it's like your front end as a service. Like you need to take that application that renders on the server and the front end, and it's just an entirely different application, which is annoying. and it just calls your APIs and that's how it gets the data it needs. So you're literally just treating it as if it's like a single page application that can't connect to your database. But the frameworks have not quite done that. And they're like, no, no, no. We'll connect to the database and we'll do all this stuff, but then it doesn't work because you've got, like, it works this way on the back end and this way on the front end anyways. Again, long winded way of saying like, it's very complicated. I don't think the technology can solve it today. I think the technology has to change before these problems can actually genuinely become solvable. And that's why I think the whole thing is like a beta, it's like, it's very much like a moving target that we're eventually we'll get there and it's definitely had value, but I don't know that, um, responsiveness for low latency connections is where the value has been created. You know, for like folks with bad internet and say remote Africa or something, like I'm sure the internet is not a very fun place for them to use these days. Some frontend code runs on the server and some in the browser which creates challenges [00:43:05] Jeremy: I guess one of the things you mentioned is there's this, almost like this split where you have the application running on the server. It has its own set of rules because it, like you said, has access to the database and it can do things that you can't do in the browser, and then you have to sort of run the same application in the browser, but it's not quite the same application because it doesn't have access to the same things in the browser. So you have this weird disconnect, I suppose. [00:43:35] David: Yeah. Yeah. And, and, and then the challenges is like a developer that's actually complicated for you from the experience point of view, cuz you have to know somehow, okay, these things are ta, these are actually running on the server and only on the server. And like, so I think the two biggest technologies that try to do this, um, or at least do it well enough, or the two that I've used, there might be some others, um, are NextJS and remix and they have very different takes on how to do this. But, remix is the one I use most recently. So I, I'll comment on that. But like, there's a, a way that you kind of say, well, this only runs on, I think the client as an example. And that helps you a little bit. You're like, okay, this is only gonna render on the client. I can, I actually can think about that and reason about that. But then there's this thing like, okay, sometimes this runs on the server, only this part runs on the server. And it's, it just becomes like the mental capacity to figure out what's going on and debug it is like so difficult. And that database problem is like the, the normal problem, right? Like of like, I can only query the database on the server because I need secure credentials or something. Okay. I understand that as a developer, but I don't understand how to make sure the application is doing what I expect it to do and how to fix it if something goes wrong. And that, that's why I think. , I'm a, I'm a believer in constraints. The only way you make progress is you simplify problems. Like you just give up on solving the complicated thing and you make the problem simpler. Right? And so for me, that's why I'm like, just take the database outta the equation. We can create APIs from the client, from the server, same security levels. Okay? Make it so it can only do that and it has to be run as almost like a UI only thing. Now that creates complexity cuz you have to run this other service, right? And, and like I personally do not wanna have to spin up a bunch of containers just to write like a simple like web application. but again, I, I think the problem has not been simplified yet for a lot of folks. Like React did this to be fair, um, it made it a lot easier to, to build UI that was responsive and, and just updated values when they changed, you know, which was a big deal for a long period of time. But I feel like everything after has not quite reached that that area, whereas it's simple and even react is hard to debug when it doesn't do what you want it to do. So I don't know, there, there's so gaps I guess is what i would say. And. Hopefully, hopefully, you know, in the next five years we'll kind of see this come to completion because it does feel like it's, it's getting closer to that compromise. You know, where like we used to have pure server rendered apps with some weird janky JavaScript on top. Now we've got this bridge of really complicated, you know, JavaScript on top, and the server apps are also complicated and it's just, it's a nightmare. And then this newer generation of these frameworks that work for some types of technology, but not all. And, and we're kind of almost coming full circle to like server rendered, you know, everything. But with like allowing the same level of interactions that we've been desiring, I guess, on the web. So, and I, fingers crossed this gets better, but right now I do not see like a clear like, oh, it's definitely there. I can see it coming. I'm like, well, we're kind of making progress. I don't love being the beta tester of the whole thing, but we're kind of getting there. And so, you know, we'll see. There are multiple ways to write mobile apps as well (flutter, react native, web views) [00:46:36] Jeremy: I guess you, you've been saying this whole shifting landscape of how Front End works has made it difficult for Sentry to provide like automatic instrumentation and things like that for, for mobile apps. Is that a different story? Like is it pretty standardized in terms of how do you instrument an Android app or an iOS app. [00:46:58] David: Sort of, but also, no, like, a good example here is like early days mobile, it's a native application. You ship a binary known quantity, right? Or maybe you embedded a web browser, but like, that was like a very different thing. Okay. And then they did things where like, okay, more of it's like embedded web browser type stuff, or dynamically render content. So that's now a moving target. the current version of that, which I'm not a mobile dev so like people have strong opinions on both sides of this fence, but it's like, okay, do you use like a, a hybrid framework which allows you to build. Say, uh, react native, which is like arou you to sort of write a JavaScript ish thing and it runs on both Android and mobile, but not really well on either. Um, or do you write a native, native app, which is like a known quantity, but then you may maintain like two code bases, have two degrees of expertise and stuff. Flutters the same thing. so there's still that version of complexity that goes on within it. And I, I think people care less about mobile cuz it impacts people less. Like, you know, there's that whole generation of like, oh, mobile's the future, everything's gonna be mobile, let's not become true. Uh, mobile's very important, but like we have desktops still. We use web software all the time, half the time on mobile. We're just using the web software at the end of the day, so at least we know that's a thing. And I think, so I think that investment in mobile has died down some. Um, but some companies like mobile is like their main experience or one of their driving experience is like a, like a company like DoorDash, mobile is as important as web, if not more, right? Because of like the types of customers. Spotify probably same thing, but I don't know, Sentry. We don't need a mobile app, who cares? It's irrelevant to the problem space, right? And so I, I think it's just not quite taken on. And so mobile is still like this secondary citizen at a lot of companies, and I think the evolution of it has been like complicated. And so I, I think a lot of the problems are known, but maybe people care less or there's just less customers. And so the weight doesn't, like, the weight is wildly different. Like JavaScript's probably like a hundred times the size from an investment point of view for everyone in the world than say mobile applications are, is how I would think about it. And so whether mobile is or isn't solved is almost irrelevant to the, the, the like general problem at hand. and I think at the very least, like mobile applications, there's like, there's like a tool chain where you can debug a lot of stuff that works fairly well and hasn't changed over the years, whereas like the web you have like browser tools, but that's about it. So. Mobile apps can have large binaries or pull in lots of dependencies at runtime [00:49:16] Jeremy: So I guess with mobile. Um, I was initially thinking of native apps, but you're, you're bringing up that there's actually people who would make a native app that's just a web view for a webpage, or there's React native or there's flutters, so there's actually, it really isn't standard how to make a mobile app. [00:49:36] David: Yeah. And even within those, it comes back to like, okay, is it now the same problem where we're loading in a bunch of JavaScript or downloading a bunch of JavaScript and content remotely and stuff? And like, you'll see this when you install a mobile app, and sometimes the binaries are huge, right? Sometimes they're really small, and then you load it up and it's downloading like several gigs of data and stuff, right? And those are completely different patterns. And even within those like subsets, I'm sure the implementations are wildly different, right? And so, you know, I, that may not be the same as like the runtime kind of changing, but I remember there was this, uh, this must be a decade ago. I, I used, I still am a gamer, but. Um, early in my career I worked a lot with like games like World of Warcraft and stuff, and I remember when games started launching progressive loading where it's like you could download a small chunk of the game and actually start playing and maybe the textures were lower, uh, like resolution and everything was lower fidelity and, and you could only go so far until the game fully installed. But like, imagine like if you're like focused on performance or something like that, measuring it there is completely different than measuring it once, say everything's installed, you know? And so I think those often become very complex use cases. And I think that used to be like an extreme edge case that was like such a, a hyper-specific optimization for like what The Warcraft, which is like one of the biggest games of all time that it made sense, you know, okay, whatever. They can build their own custom tooling and figure it out from there. And now we've taken that degree of complexity and tried to apply it to everything in the world. And it's like uhoh, like nobody has the teams or the, the, the talent or the, the experience to necessarily debug a lot of these complicated problems just like Sentry like. You know, we're not dealing with React internals. If something's wrong in the React internals, it's like somebody might be able to figure it out, but it's gonna take us so much time to figure out what's going on, versus, oh, we're rendering some html. Cool. We understand how it works. It's, it's a known, known problem. We can debug it. Like there's nothing to even debug most of the time. Right. And so, I, I don't know, I think the industry has to get to a place where you can reason about the software, where you have the calculator, right. And you don't have to figure out how the calculator works. You just can trust that it's gonna work for you. How Sentry's stack has become more complex over time [00:51:35] Jeremy: so kind of. Shifting over a little bit to Sentry's internals. You, you said that Sentry started in, was it 2008 you said? [00:51:47] David: Uh, the open source project was in 2008. Yeah. [00:51:50] Jeremy: The stack that's used in Sentry has evolved. Like I remembered that there was a period where I think you could run it with a pretty minimal stack, like I think it may have even supported SQLite. [00:52:02] David: Yeah. [00:52:03] Jeremy: And so it was something that people could run pretty easily on their own. But things have, have obviously changed a lot. And so I, I wonder if you could speak to sort of the evolution of that process. Like when do you decide like, Hey, this thing that I built in 2008, Is, you know, not gonna cut it. And I really need to re-architect what this system is. [00:52:25] David: Yeah, so I don't know if that's actually the reality of why things have changed, that it's like, oh, this doesn't work anymore. We've definitely introduced complexity in the sense of like, probably the biggest shift for Sentry was like, it used to be everything, and it was a SQL database, and everything was kind of optional. I think half that was maintainable because it was mostly built by. And so I could maintain like an architectural vision that kept it minimal. I had the experience to figure it out and duct tape the right things. Um, so that was one thing. And I think eventually, you know, that doesn't scale as you're trying to do more and build more into the product. So there's some complexity there. but for the most part you can, it can still
On today's show, podcaster and economic writer Robert Blumen compares the idealistic dystopic future portrayed in science fiction to the harsh realities of the dystopic future Klaus Schwab is promoting, and explains how and why the definitions of words like "vaccine," "immunity" and "effective" have changed since the arrival of COVID-19. GUEST OVERVIEW: Robert Blumen, a software engineer with a background in financial applications. Blumen, who lives in San Francisco, has a physics degree from Stanford University and writes frequently for mises.org and Brownstone.org. He has spoken at economic and finance conferences on Austrian economics, and he is the editor of the podcast Software Engineering Radio.
Writer and podcaster Robert Blumen returns to discuss the big lie that the COVID "vaccines" had 95% effectiveness, and more. GUEST OVERVIEW: Robert Blumen, a software engineer with a background in financial applications. Blumen, who lives in San Francisco, has a physics degree from Stanford University and writes frequently for mises.org and Brownstone.org. He has spoken at economic and finance conferences on Austrian economics, and he is the editor of the podcast Software Engineering Radio.
Software Engineering Radio - The Podcast for Professional Software Developers
Software engineer Alex Boten, author of Cloud Native Observability with Open Telemetry, joins SE Radio host Robert Blumen for a conversation about software telemetry and the OpenTelemetry project. After a brief review of the topic and the OpenTelemetry project's origins rooted in the need for interoperability between telemetry sources and back ends, they discuss the open telemetry server and its features, including transforms, filtering, sampling, and rate limiting. They consider a range of topics, starting with alternative topologies with and without the telemetry server, server pipelines, and scaling out the server, as well as a detailed look at extension points and extensions; authentication; adoption; and migration.
Luca Casonato is the tech lead for Deno Deploy and a TC39 delegate. Deno is a JavaScript runtime from the original creator of NodeJS, Ryan Dahl. Topics covered: What's a JavaScript runtime How V8 is used Why Deno was created The W3C WinterCG for server-side JavaScript Why it's difficult to ship new features in Node The benefits of web standards Creating an all-inclusive toolset like Rust and Go Deno's node compatibility layer Use cases for WebAssembly Benefits and implementation of Deno Deploy Reasons to deploy on the edge What's coming next Luca Luca Casonato @lcasdev Deno Homepage Deploy Showcase Subhosting Fresh web framework The anatomy of an Isolate Cloud Deno Users Netlify Edge Functions Deno at Slack GitHub Flat Data Shopify Oxygen Other related links Cache Web API V8 (JavaScript and WebAssembly engine) TC39 (JavaScript specification group) Web-interoperable Runtimes Community Group (WinterCG) Cloudflare Workers (Deno Deploy competitor) How Cloudflare KV works CockroachDB (Distributed database) XKCD Standards Comic Transcript You can help edit this transcript on GitHub. [00:00:07] Jeremy: Today I'm talking to Luca Casonato. He's a member of the Deno Core team and a TC 39 Delegate. [00:00:06] Luca: Hey, thanks for having me. What's a runtime? [00:00:07] Jeremy: So today we're gonna talk about Deno, and on the website it says, Deno is a runtime for JavaScript and TypeScript. So I thought we could start with defining what a runtime is. [00:00:21] Luca: Yeah, that's a great question. I think this question actually comes up a lot. It's, it's like sometimes we also define Deno as a headless browser, or I don't know, a, a JavaScript script execution tool. what actually defines runtime? I, I think what makes a runtime a runtime is that it is a, it's implemented in native code. It cannot be self-hosted. Like you cannot self-host a JavaScript runtime. and it executes JavaScript or TypeScript or some other scripting language, without relying on, well, yeah, I guess it's the self-hosting thing. Like it's, it's essentially a, a JavaScript execution engine, which is not self-hosted. So yeah, it, it maybe has IO bindings, but it doesn't necessarily need to like, it. Maybe it allows you to read the, from the file system or, or make network calls. Um, but it doesn't necessarily have to. It's, I think the, the primary definition is something which can execute JavaScript without already being written in JavaScript. How V8 and JavaScript runtimes are related [00:01:20] Jeremy: And when we hear about JavaScript run times, whether it's Deno or Node or Bun, or anything else, we also hear about it in the context of v8. Could you explain the relationship between V8 and a JavaScript run time? [00:01:36] Luca: Yeah. So V8 and, and JavaScript core and Spider Monkey, these are all JavaScript engines. So these are the low level virtual machines that can execute or that can parse your JavaScript code. turn it into byte code, maybe turn it into, compiled machine code, and then execute that code. But these engines, Do not implement any IO functions. They do not. They implement the JavaScript spec as is written. and then they provide extension hooks for, they call these host environments, um, like environments that embed these engines to provide custom functionalities to essentially poke out of the sandbox, out of the, out of the virtual machine. Um, and this is used in browsers. Like browsers have, have these engines built in. This is where they originated from. Um, and then they poke holes into this, um, sandbox virtual machine to do things like, I don't know, writing to the dom or, or console logging or making fetch calls and all these kinds of things. And what a runtime essentially does, a JavaScript runtime is it takes one of these engines and. It then provides its own set of host APIs, like essentially its own set of holes. It pokes into the sandbox. and depending on what the runtime is trying to do, um, the weight will do. This is gonna be different and, and the sort of API that is ultimately exposed to the end user is going to be different. For example, if you compare Deno and node, like node is very loosey goosey, about how it pokes holds into the sandbox, it sort of just pokes them everywhere. And this makes it difficult to enforce things like, runtime permissions for example. Whereas Deno is much more strict about how it, um, pokes holds into its sandbox. Like everything is either a web API or it's behind in this Deno name space, which means that it's, it's really easy to find, um, places where, where you're poking out of the sandbox. and really you can also compare these to browsers. Like browsers are also JavaScript run times. Um, they're just not headless. JavaScript run times, but JavaScript run times that also have a ui. and. . Yeah. Like there, there's, there's a whole Bunch of different kinds of JavaScript run times, and I think we're also seeing a lot more like embedded JavaScript run times. Like for example, if you've used React Native before, you, you may be using Hermes as a, um, JavaScript engine in your Android app, which is like a custom JavaScript engine written just for, for, for React native. Um, and this also is embedded within a, like react native run time, which is specific to React native. so it's also possible to have run times, for example, that are, that can be where the, where the back backing engine can be exchanged, which is kind of cool. [00:04:08] Jeremy: So it sounds like V8's role, one way to look at it is it can execute JavaScript code, but only pure functions. I suppose you [00:04:19] Luca: Pretty much. Yep. [00:04:21] Jeremy: Do anything that doesn't interact with IO so you think about browsers, you were mentioning you need to interact with a DOM or if you're writing a server side application, you probably need to receive or make HTTP requests, that sort of thing. And all of that is not handled by v8. That has to be handled by an external runtime. [00:04:43] Luca: Exactly Like, like one, one. There's, there's like some exceptions to this. For example, JavaScript technically has some IO built in with, within its standard library, like math, random. It's like random number. Generation is technically an IO operation, so, Technically V8 has some IO built in, right? And like getting the current date from the user, that's also technically IO So, like there, there's some very limited edge cases. It's, it's not that it's purely pure, but V8 for example, has a flag to turn it completely deterministic. which means that it really is completely pure. And this is not something which run times usually have. This is something like the feature of an engine because the engine is like so low level that it can essentially, there's so little IO that it's very easy to make deterministic where a runtime higher level, um, has, has io, um, much more difficult to make deterministic. [00:05:39] Jeremy: And, and for things like when you're working with JavaScript, there's, uh, asynchronous programming [00:05:46] Luca: mm-hmm. Concurrent JavaScript execution [00:05:47] Jeremy: So you have concurrency and things like that. Is that a part of V8 or is that the responsibility of the run time? [00:05:54] Luca: That's a great question. So there's multiple parts to this. There's the part, um, there, there's JavaScript promises, um, and sort of concurrent Java or well, yes, concurrent JavaScript execution, which is sort of handled by v8, like v8. You can in, in pure v8, you can create a promise, and you can execute some code within that promise. But without IO there's actually no way to defer time, uh, which means that in with pure v8, you can either, you can create a promise. Which executes right now. Or you can create a promise that never executes, but you can't create a promise that executes in 10 seconds because there's no way to measure 10 seconds asynchronously. What run times do is they add something called an event loop on top of this, um, on top of the base engine and that event loop, for example, like a very simple event loop, for example, might have a timer in it, which every second looks at if there's a timer schedule to run within that second. And if it does, if, if that timer exists, it'll go call out to V8 and say, you can now execute that promise. but V8 is still the one that's keeping track of, of like which promises exist, and the code that is meant to be invoked when they resolve all that kind of thing. Um, but the underlying infrastructure that actually invokes which promises get resolved at what point in time, like the asynchronous, asynchronous IO is what this is called. This is driven by the event loop, um, which is implemented by around time. So Deno, for example, it uses, Tokio for its event loop. This is a, um, an event loop written in Rust. it's very popular in the Rust ecosystem. Um, node uses libuv. This is a relatively popular runtime or, or event loop, um, implementation for c uh, plus plus. And, uh, libuv was written for Node. Tokio was not written for Deno. But um, yeah, Chrome has its own event loop implementation. Bun has its own event loop implementation. [00:07:50] Jeremy: So we, we might go a little bit more into that later, but I think what we should probably go into now is why make Deno, because you have Node that's, uh, currently very popular. The co-creator of Deno, to my understanding, actually created Node. So maybe you could explain to our audience what was missing or what was wrong with Node, where they decided I need to create, a new runtime. Why create a new runtime? (standards compliance) [00:08:20] Luca: Yeah. So the, the primary point of concern here was that node was slowly diverging from browser standards with no real path to, to, to, re converging. Um, like there was nothing that was pushing node in the direction of standards compliance and there was nothing, that was like sort of forcing node to innovate. and we really saw this because in the time between, I don't know, 2015, 2018, like Node was slowly working on esm while browsers had already shipped ESM for like three years. , um, node did not have fetch. Node hasn't had, or node only at, got fetch last year. Right? six, seven years after browsers got fetch. Node's stream implementation is still very divergent from, from standard web streams. Node was very reliant on callbacks. It still is, um, like promises in many places of the Node API are, are an afterthought, which makes sense because Node was created in a time before promises existed. Um, but there was really nothing that was pushing Node forward, right? Like nobody was actively investing in, in, in improving the API of Node to be more standards compliant. And so what we really needed was a new like Greenfield project, which could demonstrate that actually writing a new server side run. Is A viable, and b is totally doable with an API that is more standards combined. Like essentially you can write a browser, like a headless browser and have that be an excellent to use JavaScript runtime, right? And then there was some things that were I on top of that, like a TypeScript support because TypeScript was incredibly, or is still incredibly popular. even more so than it was four years ago when, when Deno was created or envisioned, um, this permission system like Node really poked holes into the V8 sandbox very early on with, with like, it's gonna be very difficult for Node to ever, ever, uh, reconcile this, this. Especially cuz the, some, some of the APIs that it, that it exposes are just so incredibly low level that like, I don't know, you can mutate random memory within your process. Um, which like if you want to have a, a secure sandbox like that just doesn't work. Um, it's not compatible. So there was really needed to be a place where you could explore this, um, direction and, and see if it worked. And Deno was that. Deno still is that, and I think Deno has outgrown that now into something which is much more usable as, as like a production ready runtime. And many people do use it, in production. And now Deno is on the path of slowly converging back with Node, um, in from both directions. Like Node is slowly becoming more standards compliant. and depending on who you ask this was, this was done because of Deno and some people said it would had already been going on and Deno just accelerated it. but that's not really relevant because the point is that like Node is becoming more standard compliant and, and the other direction is Deno is becoming more node compliant. Like Deno is implementing node compatibility layers that allow you to run code that was originally written for the node ecosystem in the standards compliant run time. so through those two directions, the, the run times are sort of, um, going back towards each other. I don't think they'll ever merge. but we're, we're, we're getting to a point here pretty soon, I think, where it doesn't really matter what runtime you write for, um, because you'll be able to write code written for one runtime in the other runtime relatively easily. [00:12:03] Jeremy: If you're saying the two are becoming closer to one another, becoming closer to the web standard that runs in the browser, if you're talking to someone who's currently developing in node, what's the incentive for them to switch to Deno versus using Node and then hope that eventually they'll kind of meet in the middle. [00:12:26] Luca: Yeah, so I think, like Deno is a lot more than just a runtime, right? Like a runtime executes JavaScript, Deno executes JavaScript, it executes type script. But Deno is so much more than that. Like Deno has a built-in format, or it has a built-in linter. It has a built-in testing framework, a built-in benching framework. It has a built-in Bundler, it, it like can create self-hosted, um, executables. yeah, like Bundle your code and the Deno executable into a single executable that you can trip off to someone. Um, it has a dependency analyzer. It has editor integrations. it has, Yeah. Like I could go on for hours, (laughs) about all of the auxiliary tooling that's inside of Deno, that's not a JavaScript runtime. And also Deno as a JavaScript runtime is just more standards compliant than any of the other servers at Runtimes right now. So if, if you're really looking for something which is standards complaint, which is gonna like live on forever, then it's, you know, like you cannot kill off the Fetch API ever. The Fetch API is going to live forever because Chrome supports it. Um, and the same goes for local storage and, and like, I don't know, the Blob API and all these other web APIs like they, they have shipped and browsers, which means that they will be supported until the end of time. and yeah, maybe Node has also reached that with its api probably to some extent. but yeah, don't underestimate the power of like 3 billion Chrome users. that would scream immediately if the Fetch API stopped working Right? [00:13:50] Jeremy: Yeah, I, I think maybe what it sounds like also is that because you're using the API that's used in the browser places where you deploy JavaScript applications in the future, you would hope that those would all settle on using that same API so that if you were using Deno, you could host it at different places and not worry about, do I need to use a special API maybe that you would in node? WinterCG (W3C group for server side JavaScript) [00:14:21] Luca: Yeah, exactly. And this is actually something which we're specifically working towards. So, I don't know if you've, you've heard of WinterCG? It's a, it's a community group at the W3C that, um, CloudFlare and, and Deno and some others including Shopify, have started last year. Um, we're essentially, we're trying to standardize the concept of what a server side JavaScript runtime is and what APIs it needs to have available to be standards compliant. Um, and essentially making this portability sort of written down somewhere and like write down exactly what code you can write and expect to be portable. And we can see like that all of the big, all of the big players that are involved in, in, um, building JavaScript run times right now are, are actively, engaged with us at WinterCG and are actively building towards this future. So I would expect that any code that you write today, which runs. in Deno, runs in CloudFlare, workers runs on Netlify Edge functions, runs on Vercel's Edge, runtime, runs on Shopify Oxygen, is going to run on the other four. Um, of, of those within the next couple years here, like I think the APIs of these is gonna converge to be essentially the same. there's obviously gonna always be some, some nuances. Um, like, I don't know, Chrome and Firefox and Safari don't perfectly have the same API everywhere, right? Like Chrome has some web Bluetooth capabilities that Safari doesn't, or Firefox has some, I don't know, non-standard extensions to the error object, which none of the other runtimes do. But overall you can expect these front times to mostly be aligned. yeah, and I, I think that's, that's really, really, really excellent and that, that's I think really one of the reasons why one should really consider, like building for, for this standard runtime because it, it just guarantees that you'll be able to host this somewhere in five years time and 10 years time, with, with very little effort. Like even if Deno goes under or CloudFlare goes under, or, I don't know, nobody decides to maintain node anymore. It'll be easy to, to run somewhere else. And also I expect that the big cloud vendors will ultimately, um, provide, manage offerings for, for the standards compliant JavaScript on time as well. Is Node part of WinterCG? [00:16:36] Jeremy: And this WinterCG group is Node a part of that as well? [00:16:41] Luca: Um, yes, we've invited Node, um, to join, um, due to the complexities of how node's, internal decision making system works. Node is not officially a member of WinterCG. Um, there is some individual members of the node, um, technical steering committee, which are participating. for example, um, James m Snell is, is the co-chair, is my co-chair on, on WinterCG. He also works at CloudFlare. He's also a node, um, TSC member, Mateo Colina, who has been, um, instrumental to getting fetch landed in Node, um, is also actively involved. So Node is involved, but because Node is node and and node's decision making process works the way it does, node is not officially listed anywhere as as a member. but yeah, they're involved and maybe they'll be a member at some point. But, yeah, let's. , see (laughs) [00:17:34] Jeremy: Yeah. And, and it, so it, it sounds like you're thinking that's more of a, a governance or a organizational aspect of note than it is a, a technical limitation. Is that right? [00:17:47] Luca: Yeah. I obviously can't speak for the node technical steering committee, but I know that there's a significant chunk of the node technical steering committee that is, very favorable towards, uh, standards compliance. but parts of the Node technical steering committee are also not, they are either indifferent or are actively, I dunno if they're still actively working against this, but have actively worked against standards compliance in the past. And because the node governance structure is very, yeah, is, is so, so open and let's, um, and let's, let's all these voices be heard, um, that just means that decision making processes within Node can take so long, like. . This is also why the fetch API took eight years to ship. Like this was not a technical problem. and it is also not a technical problem. That Node does not have URL pattern support or, the file global or, um, that the web crypto API was not on this, on the global object until like late last year, right? Like, these are not technical problems, these are decision making problems. Um, and yeah, that was also part of the reason why we started Deno as, as like a separate thing, because like you can try to innovate node, from the inside, but innovating node from the inside is very slow, very tedious, and requires a lot of fighting. And sometimes just showing somebody, from the outside like, look, this is the bright future you could have, makes them more inclined to do something. Why it takes so long to ship new features in Node [00:19:17] Jeremy: Do, do you have a sense for, you gave the example of fetch taking eight years to, to get into node. Do you, do you have a sense of what the typical objection is to, to something like that? Like I, I understand there's a lot of people involved, but why would somebody say, I, I don't want this [00:19:35] Luca: Yeah. So for, for fetch specifically, there was a, there was many different kinds of concerns. Um, one of the, I, I can maybe list two of them. One of them was for example, that the fetch API is not a good API and as such, node should not have it. which is sort of. missing the point of, because it's a standard API, how good or bad the API is is much less relevant because if you can share the API, you can also share a wrapper that's written around the api. Right? and then the other concern was, node does need fetch because Node already has an HTTP API. Um, so, so these are both kind of examples of, of concerns that people had for a long time, which it took a long time to either convince these people or, or to, push the change through anyway. and this is also the case for, for other things like, for example, web, crypto, um, like why do we need web crypto? We already have node crypto, or why do we need yet another streams? Implementation node already has four different streams implementations. Like, why do we need web streams? and the, the. Like, I don't know if you know this XKCD of, there's 14 competing standards. so let's write a 15th standard, to unify them all. And then at the end we just have 15 competing standards. Um, so I think this is also the kind of concern that people were concerned about, but I, I think what we've seen here is that this is really not a concern that one needs to have because it ends up that, or it turns out in the end that if you implement web APIs, people will use web APIs and will use web APIs only for their new code. it takes a while, but we're seeing this with ESM versus require like new code written with require much less common than it was two years ago. And, new code now using like Xhr, whatever it's called, form request or. You know, the one, I mean, compared to using Fetch, like nobody uses that name. Everybody uses Fetch. Um, and like in Node, if you write a little script, like you're gonna use Fetch, you're not gonna use like Nodes, htp, dot get API or whatever. and we're gonna see the same thing with Readable Stream. We're gonna see the same thing with Web Crypto. We're gonna see, see the same thing with Blob. I think one of the big ones where, where Node is still, I, I, I don't think this is one that's ever gonna get solved, is the, the Buffer global and Node. like we have the Uint8, this Uint8 global, um, and like all the run times including browsers, um, and Buffer is like a super set of that, but it's in global scope. So it, it's sort of this non-standard extension of unit eight array that people in node like to use and it's not compatible with anything else. Um, but because it's so easy to get at, people use it anyway. So those are, those are also kind of problems that, that we'll have to deal with eventually. And maybe that means that at some point the buffer global gets deprecated and I don't know, probably can never get removed. But, um, yeah, these are kinds of conversations that the no TSE is going have to have internally in, I don't know, maybe five years. Write once, have it run on any hosting platform [00:22:37] Jeremy: Yeah, so at a high level, What's shipped in the browser, it went through the ECMAScript approval process. People got it into the browser. Once it's in the browser, probably never going away. And because of that, it's safe to build on top of that for these, these server run times because it's never going away from the browser. And so everybody can kind of use it into the future and not worry about it. Yeah. [00:23:05] Luca: Exactly. Yeah. And that's, and that's excluding the benefit that also if you have code that you can write once and use in both the browser and the server side around time, like that's really nice. Um, like that, that's the other benefit. [00:23:18] Jeremy: Yeah. I think that's really powerful. And that right now, when someone's looking at running something in CloudFlare workers versus running something in the browser versus running something in. it's, I think a lot of people make the assumption it's just JavaScript, so I can use it as is. But it, it, there are at least currently, differences in what APIs are available to you. [00:23:43] Luca: Yep. Yep. Why bundle so many things into Deno? [00:23:46] Jeremy: Earlier you were talking about how Deno is more than just the runtime. It has a linter, formatter, file watcher there, there's all sorts of stuff in there. And I wonder if you could talk a little bit to the, the reasoning behind that [00:24:00] Luca: Mm-hmm. [00:24:01] Jeremy: Having them all be separate things. [00:24:04] Luca: Yeah, so the, the reasoning here is essentially if you look at other modern run time or mo other modern languages, like Rust is a great example. Go is a great example. Even though Go was designed around the same time as Node, it has a lot of these same tools built in. And what it really shows is that if the ecosystem converges, like is essentially forced to converge on a single set of built-in tooling, a that built-in tooling becomes really, really excellent because everybody's using it. And also, it means that if you open any project written by any go developer, any, any rest developer, and you look at the tests, you immediately understand how the test framework works and you immediately understand how the assertions work. Um, and you immediately understand how the build system works and you immediately understand how the dependency imports work. And you immediately understand like, I wanna run this project and I wanna restart it when my file changes. Like, you immediately know how to do that because it's the same everywhere. Um, and this kind of feeling of having to learn one tool and then being able to use all of the projects, like being able to con contribute to open source when you're moving jobs, whatever, like between personal projects that you haven't touched in two years, you know, like being able to learn this once and then use it everywhere is such an incredibly powerful tool. Like, people don't appreciate this until they've used a runtime or, or, or language which provides this to them. Like, you can go to any go developer and ask them if they would like. There, there's this, there's this saying in the Go ecosystem, um, that Go FMT is nobody's favorite, but, or, uh, wait, no, I don't remember what the, how the saying goes, but the saying essentially implies that the way that go FMT formats code, maybe not everybody likes, but everybody loves go F M T anyway, because it just makes everything look the same. And like, you can read your friend's code, your, your colleagues code, your new jobs code, the same way that you did your code from two years ago. And that's such an incredibly powerful feeling. especially if it's like well integrated into your IDE you clone a repository, open that repository, and like your testing panel on the left hand side just populates with all the tests, and you can click on them and run them. And if an assertion fails, it's like the standard output format that you're already familiar with. And it's, it's, it's a really great feeling. and if you don't believe me, just go try it out and, and then you will believe me, (laughs) [00:26:25] Jeremy: Yeah. No, I, I'm totally with you. I, I think it's interesting because with JavaScript in particular, it feels like the default in the community is the opposite, right? There's so many different ways. Uh, there are so many different build tools and testing frameworks and, formatters, and it's very different than, like you were mentioning, a go or a Rust that are more recent languages where they just include that, all Bundled in. Yeah. [00:26:57] Luca: Yeah, and I, I think you can see this as well in, in the time that average JavaScript developer spends configuring their tooling compared to a rest developer. Like if I write Rust, I write Rust, like all day, every day. and I spend maybe two, 3% of my time configuring Rust tooling like. Doing dependency imports, opening a new project, creating a format or config file, I don't know, deleting the build directory, stuff like that. Like that's, that's essentially what it means for me to configure my rest tooling. Whereas if you compare this to like a front-end JavaScript project, like you have to deal with making sure that your React version is compatible with your React on version, it's compatible with your next version is compatible with your ve version is compatible with your whatever version, right? this, this is all not automatic. Making sure that you use the right, like as, as a front end developer, you developer. You don't have just NPM installed, no. You have NPM installed, you have yarn installed, you have PNPM installed. You probably have like, Bun installed. And, and, and I don't know to use any of these, you need to have corepack enabled in Node and like you need to have all of their global bin directories symlinked into your or, or, or, uh, included in your path. And then if you install something and you wanna update it, you don't know, did I install it with yarn? Did I install it with N pNPM? Like this is, uh, significant complexity and you, you tend to spend a lot of time dealing with dependencies and dealing with package management and dealing with like tooling configuration, setting up esent, setting up prettier. and I, I think that like, especially Prettier, for example, really showed, was, was one of the first things in the JavaScript ecosystem, which was like, no, we're not gonna give you a config where you, that you can spend like six hours configuring, it's gonna be like seven options and here you go. And everybody used it because, Nobody likes configuring things. It turns out, um, and even though there's always the people that say, oh, well, I won't use your tool unless, like, we, we get this all the time. Like, I'm not gonna use Deno FMT because I can't, I don't know, remove the semicolons or, or use single quotes or change my tab width to 16. Right? Like, wait until all of your coworkers are gonna scream at you because you set the tab width to 16 and then see what they change it to. And then you'll see that it's actually the exact default that, everybody uses. So it'll, it'll take a couple more years. But I think we're also gonna get there, uh, like Node is starting to implement a, a test runner. and I, I think over time we're also gonna converge on, on, on, on like some standard build tools. Like I think ve, for example, is a great example of this, like, Doing a front end project nowadays. Um, like building new front end tooling that's not built on Vite Yeah. Don't like, Vite's it's become the standard and I think we're gonna see that in a lot more places. We should settle on what tools to use [00:29:52] Jeremy: Yeah, though I, I think it's, it's tricky, right? Because you have so many people with their existing projects. You have people who are starting new projects and they're just searching the internet for what they should use. So you're, you're gonna have people on web pack, you're gonna have people on Vite, I guess now there's gonna be Turbo pack, I think is another one that's [00:30:15] Luca: Mm-hmm. [00:30:16] Jeremy: There's, there's, there's all these different choices, right? And I, I think it's, it's hard to, to really settle on one, I guess, [00:30:26] Luca: Yeah, [00:30:27] Jeremy: uh, yeah. [00:30:27] Luca: like I, I, I think this is, this is in my personal opinion also failure of the Node Technical Steering committee, for the longest time to not decide that yes, we're going to bless this as the standard format for Node, and this is the standard package manager for Node. And they did, they sort of did, like, they, for example, node Blessed NPM as the standard, package manager for N for for node. But it didn't innovate on npm. Like no, the tech nodes, tech technical steering committee did not force NPM to innovate NPMs, a private company ultimately bought by GitHub and they had full control over how the NPM cli, um, evolved and nobody forced NPM to, to make sure that package install times are six times faster than they were. Three years ago, like nobody did that. so it didn't happen. And I think this is, this is really a failure of, of the, the, the, yeah, the no technical steering committee and also the wider JavaScript ecosystem of not being persistent enough with, with like focus on performance, focus on user experience, and, and focus on simplicity. Like things got so out of hand and I'm happy we're going in the right direction now, but, yeah, it was terrible for some time. (laughs) Node compatibility layer [00:31:41] Jeremy: I wanna talk a little bit about how we've been talking about Deno in the context of you just using Deno using its own standard library, but just recently last year you added a compatibility shim where people are able to use node libraries in Deno. [00:32:01] Luca: Mm-hmm. [00:32:01] Jeremy: And I wonder if you could talk to, like earlier you had mentioned that Deno has, a different permissions model. on the website it mentions that Deno's HTTP server is two times faster than node in a Hello World example. And I'm wondering what kind of benefits people will still get from Deno if they choose to use packages from Node. [00:32:27] Luca: Yeah, it's a great question. Um, so I think a, again, this is sort of a like, so just to clarify what we actually implemented, like what we have is we have support for you to import NPM packages. Um, so you can import any NPM package from NPM, from your type script or JavaScript ECMAScript module, um, that you have, you already have for your Deno code. Um, and we will under the hood, make sure that is installed somewhere in some directory globally. Like PNPM does. There's no local node modules folder you have to deal with. There's no package of Jason you have to deal with. Um, and there's no, uh, package. Jason, like versioning things you need to deal with. Like what you do is you do import cowsay from NPM colon cowsay at one, and that will import cowsay with like the semver tag one. Um, and it'll like do the sim resolution the same way node does, or the same way NPM does rather. And what you get from that is that essentially it gives you like this backdoor to a callout to all of the existing node code that Isri been written, right? Like you cannot expect that Deno developers, write like, I don't know. There was this time when Deno did not really have that many, third party modules yet. It was very early on, and I don't know the, you either, if you wanted to connect to Postgres and there was no Postgres driver available, then the solution was to write your own Postgres driver. And that is obviously not great. Um, (laughs) . So the better solution here is to let users for these packages where there's no Deno native or, or, or web native or standard native, um, package for this yet that is importable with url. Um, specifiers, you can import this from npm. Uh, so it's sort of this like backdoor into the existing NPM ecosystem. And we explicitly, for example, don't allow you to, create a package.json file or, import bare node specifiers because we don't, we, we want to stay standards compliant here. Um, but to make this work effectively, we need to give you this little back door. Um, and inside of this back door. All hell is like, or like everything is terrible inside there, right? Like inside there you can do bare specifiers and inside there you can like, uh, there's package.json and there's crazy node resolution and underscore underscore DIRNAME and common js. And like all of that stuff is supported inside of this backdoor to make all the NPM packages work. But on the outside it's exposed as this nice, ESM only, NPM specifiers. and the, the reason you would want to use this over, like just using node directly is because again, like you wanna use TypeScript, no config, like necessary. You want to use, you wanna have a formatter you wanna have a linter, you wanna have tooling that like does testing and benchmarking and compiling or whatever. All of that's built in. You wanna run this on the edge, like close to your users and like 30 different, 35 different, uh, points of presence. Um, it's like, Okay, push it to your git repository. Go to this website, click a button two times, and it's running in 35 data centers. like this is, this is the kind of ex like developer experience that you can, you do not get. You, I will argue that you cannot get with Node right now. Like even if you're using something like ts-node, it is not possible to get the same level of developer experience that you do with Deno. And the, the, the same like speed at which you can iterate, iterate on your projects, like create new projects, iterate on them is like incredibly fast in Deno. Like, I can open a, a, a folder on my computer, create a single file, may not ts, put some code in there and then call Deno Run may not. And that's it. Like I don't, I did not need to do NPM install I did not need to do NPM init -y and remove the license and version fields and from, from the generated package.json and like set private to true and whatever else, right? It just all works out of the box. And I think that's, that's what a lot of people come to deno for and, and then ultimately stay for. And also, yeah, standards compliance. So, um, things you build in Deno now are gonna work in five, 10 years, with no hassle. Node shims and testing [00:36:39] Jeremy: And so with this compatibility layer or this, this shim, is it where the node code is calling out to node APIs and you're replacing those with Deno compatible equivalents? [00:36:54] Luca: Yeah, exactly. Like for example, we have a shim in place that shims out the node crypto API on top of the web crypto api. Like sort of, some, some people may be familiar with this in the form of, um, Browserify shims. if anybody still remembers those, it's essentially. , your front end tooling, you were able to import from like node crypto in your front end projects and then behind the scenes your web packs or your browser replies or whatever would take that import from node crypto and would replace it with like the shim that was essentially exposed the same APIs node crypto, but under the hood, wasn't implemented with native calls, but was implemented on top of web crypto, or implemented in user land even. And Deno does something similar. there's a couple edge cases of APIs that there's, where, where we do not expose the underlying thing that we shim to, to end users, outside of the node shim. So like there's some, some APIs that I don't know if I have a good example, like node nextTick for example. Um, like to properly be able to shim node nextTick, you need to like implement this within the event loop in the runtime. and. , you don't need this in Deno, because Deno, you use the web standard queueMicrotask to, to do this kind of thing. but to be able to shim it correctly and run node applications correctly, we need to have this sort of like backdoor into some ugly APIs, um, which, which natively integrate in the runtime, but, yeah, like allow, allow this node code to run. [00:38:21] Jeremy: A, anytime you're replacing a component with a, a shim, I think there's concerns about additional bugs or changes in behavior that can be introduced. Is that something that you're seeing and, and how are you accounting for that? [00:38:38] Luca: Yeah, that's, that's an excellent question. So this is actually a, a great concern that we have all the time. And it's not just even introducing bugs, sometimes it's removing bugs. Like sometimes there's bugs in the node standard library which are there, and people are relying on these bugs to be there for the applications to function correctly. And we've seen this a lot, and then we implement this and we implement from scratch and we don't make that same bug. And then the test fails or then the application fails. So what we do is, um, we actually run node's test suite against Deno's Shim layer. So Node has a very extensive test suite for its own standard library, and we can run this suite against, against our shims to find things like this. And there's still edge cases, obviously, which node, like there was, maybe there's a bug which node was not even aware of existing. Um, where maybe this, like it's is, it's now standard, it's now like intended behavior because somebody relies on it, right? Like the second somebody relies on, on some non-standard or some buggy behavior, it becomes intended. Um, but maybe there was no test that explicitly tests for this behavior. Um, so in that case we'll add our own tests to, to ensure that. But overall we can already catch a lot of these by just testing, against, against node's tests. And then the other thing is we run a lot of real code, like we'll try run Prisma and we'll try run Vite and we'll try run NextJS and we'll try run like, I don't know, a bunch of other things that people throw at us and, check that they work and they work and there's no bugs. Then we did our job well and our shims are implemented correctly. Um, and then there's obviously always the edge cases where somebody did something absolutely crazy that nobody thought possible. and then they'll open an issue on the Deno repo and we scratch our heads for three days and then we'll fix it. And then in the next release there'll be a new bug that we added to make the compatibility with node better. so yeah, but I, yeah. Running tests is the, is the main thing running nodes test. Performance should be equal or better [00:40:32] Jeremy: Are there performance implications? If someone is running an Express App or an NextJS app in Deno, will they get any benefits from the Deno runtime and performance? [00:40:45] Luca: Yeah. It's actually, there is performance implications and they're usually. The opposite of what people think they are. Like, usually when you think of performance implications, it's always a negative thing, right? It's always okay. Like you, it's like a compromise. like the shim layer must be slower than the real node, right? It's not like we can run express faster than node can run, express. and obviously not everything is faster in Deno than it is in node, and not everything is faster in node than it is in Deno. It's dependent on the api, dependent on, on what each team decided to optimize. Um, and this also extends to other run times. Like you can always cherry pick results, like, I don't know, um, to, to make your runtime look faster in certain benchmarks. but overall, what really matters is that you do not like, the first important step for for good node compatibility is to make sure that if somebody runs your code or runs their node code in Deno or your other run type or whatever, It performs at least the same. and then anything on top of that great cherry on top. Perfect. but make sure the baselines is at least the same. And I think, yeah, we have very few APIs where we behave, where we, where, where like there's a significant performance degradation in Deno compared to Node. Um, and like we're actively working on these things. like Deno is not a, a, a project that's done, right? Like we have, I think at this point, like 15 or 16 or 17 engineers working on Deno, spanning across all of our different projects. And like, we have a whole team that's dedicated to performance, um, and a whole team that's dedicated node compatibility. so like these things get addressed and, and we make patch releases every week and a minor release every four weeks. so yeah, it's, it's not a standstill. It's, uh, constantly improving. What should go into the standard library? [00:42:27] Jeremy: Uh, something that kind of makes Deno stand out as it's standard library. There's a lot more in there than there is in in the node one. [00:42:38] Luca: Mm-hmm. [00:42:39] Jeremy: Uh, I wonder if you could speak to how you make decisions on what should go into it. [00:42:46] Luca: Yeah, so early on it was easier. Early on, the, the decision making process was essentially, is this something that a top 100 or top 1000 NPM library implements? And if it is, let's include it. and the decision making is still short of based on that. But right now we've already implemented most of the low hanging fruit. So things that we implement now are, have, have discussion around them whether we should implement them. And we have a process where, well we have a whole team of engineers on our side and we also have community members that, that will review prs and, and, and make comments. Open issues and, and review those issues, to sort of discuss the pros and cons of adding any certain new api. And sometimes it's also that somebody opens an issue that's like, I want, for example, I want an API to, to concatenate two unit data arrays together, which is something you can really easily do node with buffer dot con cat, like the scary buffer thing. and there's no standards way of doing that right now. So we have to have a little utility function that does that. But in parallel, we're thinking about, okay, how do we propose, an addition to the web standards now that makes it easy to concatenate iterates in the web standards, right? yeah, there's a lot to it. Um, but it's, it's really, um, it's all open, like all of our, all of our discussions for, for, additions to the standard library and things like that. It's all, all, uh, public on GitHub and the GitHub issues and GitHub discussions and GitHub prs. Um, so yeah, that's, that's where we do that. [00:44:18] Jeremy: Yeah, cuz to give an example, I was a little surprised to see that there is support for markdown front matter built into the standard library. But when you describe it as we look at the top a hundred thousand packages, are people looking at markdown? Are they looking at front matter? I, I'm sure there's a fair amount that are so that that makes sense. [00:44:41] Luca: Yeah, like it sometimes, like that one specifically was driven by, like, our team was just building a lot of like little blog pages and things like that. And every time it was either you roll your own front matter part or you look for one, which has like a subtle bug here and the other one has a subtle bug there and really not satisfactory with any of them. So, we, we roll that into the standard library. We add good test coverage for it good, add good documentation for it, and then it's like just a resource that people can rely on. Um, and you don't, you then don't have to make the choice of like, do I use this library to do my front meta parsing or the other library? No, you just use the one that's in the standard library. It's, it's also part of this like user experience thing, right? Like it's just a much nicer user experience, not having to make a choice, about stuff like that. Like completely inconsequential stuff. Like which library do we use to do front matter parsing? (laughs) [00:45:32] Jeremy: yeah. I mean, I think when, when that stuff is not there, then I think the temptation is to go, okay, let me see what node modules there are that will let me parse the front matter. Right. And then it, it sounds like probably ideally you want people to lean more on what's either in the standard library or what's native to the Deno ecosystem. Yeah. [00:46:00] Luca: Yeah. Like the, the, one of the big benefits is that the Deno Standard Library is implemented on top of web standards, right? Like it's, it's implemented on top of these standard APIs. so for example, there's node front matter libraries which do not run in the browser because the browser does not have the buffer global. maybe it's a nice library to do front matter pricing with, but. , you choose it and then three days later you decide that actually this code also needs to run in the browser, and then you need to go switch your front matter library. Um, so, so those are also kind of reasons why we may include something in Strand Library, like maybe there's even really good module already to do something. Um, but if there's certain reliance on specific node features that, um, we would like that library to also be compatible with, with, with web standards, we'll, uh, we might include in the standard library, like for example, YAML Parser, um, or the YAML Parser in the standard library is, is a fork of, uh, of the node YAML module. and it's, it's essentially that, but cleaned up and, and made to use more standard APIs rather than, um, node built-ins. [00:47:00] Jeremy: Yeah, it kind of reminds me a little bit of when you're writing a front end application, sometimes you'll use node packages to do certain things and they won't work unless you have a compatibility shim where the browser can make use of certain node APIs. And if you use the APIs that are built into the browser already, then you won't, you won't need to deal with that sort of thing. [00:47:26] Luca: Yeah. Also like less Bundled size, right? Like if you don't have to shim that, that's less, less code you have to ship to the client. WebAssembly use cases [00:47:33] Jeremy: Another thing I've seen with Deno is it supports running web assembly. [00:47:40] Luca: Mm-hmm. [00:47:40] Jeremy: So you can export functions and call them from type script. I was curious if you've seen practical uses of this in production within the context of Deno. [00:47:53] Luca: Yeah. there's actually a Bunch of, of really practical use cases, so probably the most executed bit of web assembly inside of Deno right now is actually yes, build like, yes, build has a web assembly, build like yeses. Build is something that's written and go. You have the choice of either running. Um, natively in machine code as, as like an ELF process on, on Linux or on on Windows or whatever. Or you can use the web assembly build and then it runs in web assembly. And the web assembly build is maybe 50% slower than the, uh, native build, but that is still significantly faster than roll up or, or, or, or I don't know, whatever else people use nowadays to do JavaScript Bun, I don't know. I, I just use es build always, um, So, um, for example, the Deno website, is running on Deno Deploy. And Deno Deploy does not allow you to run Subprocesses because it's, it's like this edge run time, which, uh, has certain security permissions that it's, that are not granted, one of them being sub-processes. So it needs to execute ES build. And the way it executes es build is by running them inside a web assembly. Um, because web assembly is secure, web assembly is, is something which is part of the JavaScript sandbox. It's inside the JavaScript sandbox. It doesn't poke any holes out. Um, so it's, it's able to run within, within like very strict security context. . Um, and then other examples are, I don't know, you want to have a HTML sanitizer, which is actually built on the real HTML par in a browser. we, we have an hdml sanitizer called com or, uh, ammonia, I don't remember. There's, there's an HTML sanitizer library on denoland slash x, which is built on the html parser from Firefox. Uh, which like ensures essentially that your html, like if you do HTML sanitization, you need to make sure your HTML par is correct, because if it's not, you might like, your browser might parse some HTML one way and your sanitizer pauses it another way and then it doesn't sanitize everything correctly. Um, so there's this like the Firefox HTML parser compiled to web assembly. Um, you can use that to. HTML sanitization, or the Deno documentation generation tool, for example. Uh, Deno Doc, there's a web assembly built for it that allows you to programmatically, like generate documentation for, for your type script modules. Um, yeah, and, and also like, you know, deno fmt is available as a WebAssembly module for programmatic access and a Bunch of other internal Deno, programs as well. Like, or, uh, like components, not programs. [00:50:20] Jeremy: What are some of the current limitations of web assembly and Deno for, for example, from web assembly, can I make HTTP requests? Can I read files? That sort of thing. [00:50:34] Luca: Mm-hmm. . Yeah. So web assembly, like when you spawn as web assembly, um, they're called instances, WebAssembly instances. It runs inside of the same vm, like the same, V8 isolate is what they're called, but. it does not have it, it's like a completely fresh sandbox, sort of, in the sense that I told you that between a runtime and like an engine essentially implements no IO calls, right? And a runtime does, like a runtime, pokes holds into the, the, the engine. web assembly by default works the same way that there is no holes poked into its sandbox. So you have to explicitly poke some holes. Uh, if you want to do HTTP calls, for example, when, when you create web assembly instance, it gives you, or you can give it something called imports, uh, which are essentially JavaScript function bindings, which you can call from within the web assembly. And you can use those function bindings to do anything you can from JavaScript. You just have to pass them through explicitly. and. . Yeah. Depending on how you write your web assembly, like if you write it in Rust, for example, the tooling is very nice and you can just call some JavaScript code from your Rust, and then the build system will automatically make sure that the right function bindings are passed through with the right names. And like, you don't have to deal with anything. and if you're writing go, it's slightly more complicated. And if you're writing like raw web assembly, like, like the web assembly, text format and compiling that to a binary, then like you have to do everything yourself. Right? It's, it's sort of the difference between writing C and writing JavaScript. Like, yeah. What level of abstraction do you want? It's definitely possible though, and that's for limitations. it, the same limitations as, as existing browsers apply. like the web assembly support in Deno is equivalent to the web assembly support in Chrome. so you can do, uh, many things like multi-threading and, and stuff like that already. but especially around, shared mutable memory, um, and having access to that memory from JavaScript. That's something which is a real difficulty with web assembly right now. yeah, growing web assembly memory is also rather difficult right now. There's, there's a, there's a couple inherent limitations right now with web assembly itself. Um, but those, those will be worked out over time. And, and Deno is like very up to date with the version of, of the standard, it, it implements, um, through v8. Like we're, we're, we're up to date with Chrome Beta essentially all the time. So, um, yeah. Any, anything you see in, in, in Chrome beta is gonna be in Deno already. Deno Deploy [00:52:58] Jeremy: So you talked a little bit about this before, the Deno team, they have their own, hosting. Platform called Deno Deploy. So I wonder if you could explain what that is. [00:53:12] Luca: Yeah, so Deno has this really nice, this really nice concept of permissions which allow you to, sorry, I'm gonna start somewhere slightly, slightly unrelated. Maybe it sounds like it's unrelated, but you'll see in a second. It's not unrelated. Um, Deno has this really nice permission system which allows you to sandbox Deno programs to only allow them to do certain operations. For example, in Deno, by default, if you try to open a file, it'll air out and say you don't have read permissions to read this file. And then what you do is you specify dash, dash allow read um, maybe you have to give it. they can either specify, allow, read, and then it'll grant to read access to the entire file system. Or you can explicitly specify files or folders or, any number of things. Same goes for right permissions, same goes for network permissions. Um, same goes for running subprocesses, all these kind of things. And by limiting your permissions just a little bit. Like, for example, by just disabling sub-processes and foreign function interface, but allowing everything else, allowing reeds and allowing network access and all that kind of stuff. we can run Deno programs in a way that is significantly more cost effective to you as the end user than, and, and like we can cold start them much faster than, like you may be able to with a, with a more conventional container based, uh, system. So what, what do you, what Deno Deploy is, is a way to run JavaScript or Deno Code, on our data centers all across the world with very little latency. like you can write some JavaScript code which execute, which serves HTTP requests deploy that to our platform, and then we'll make sure to spin that code up all across the world and have your users be able to access it through some URL or, or, or some, um, custom domain or something like that. and this is some, this is very similar to CloudFlare workers, for example. Um, and it's like Netlify Edge functions is built on top of Deno Deploy. Like Netlify Edge functions is implemented on top of Deno Deploy, um, through our sub hosting product. yeah, essentially Deno Deploy is, is, um, yeah, a cloud hosting service for JavaScript, um, which allows you to execute arbitrary JavaScript. and there there's a couple, like different directions we're going there. One is like more end user focused, where like you link your GitHub repository and. Like, we'll, we'll have a nice experience like you do with Netlify and Versace, that word like your commits automatically get deployed and you get preview deployments and all that kind of thing. for your backend code though, rather than for your front end websites. Although you could also write front-end websites and you know, obviously, and the other direction is more like business focused. Like you're writing a SaaS application and you want to allow the user to customize, the check like you're writing a SaaS application that provides users with the ability to write their own online store. Um, and you want to give them some ability to customize the checkout experience in some way. So you give them a little like text editor that they can type some JavaScript into. And then when, when your SaaS application needs to hit this code path, it sends a request to us with the code, we'll execute that code for you in a secure way. In a secure sandbox. You can like tell us you, this code only has access to like my API server and no other networks to like prevent data exfiltration, for example. and then you do, you can have all this like super customizable, code in inside of your, your SaaS application without having to deal with any of the operational complexities of scaling arbitrary code execution, or even just doing arbitrary code execution, right? Like it's, this is a very difficult problem and give it to someone else and we deal with it and you just get the benefits. yeah, that's Deno Deploy, and it's built by the same team that builds the Deno cli. So, um, all the, all of your favorite, like Deno cli, or, or Deno APIs are available in there. It's just as web standard is Deno, like you have fetch available, you have blob available, you have web crypto available, that kind of thing. yeah. Running code in V8 isolates [00:56:58] Jeremy: So when someone ships you their, their code and you run it, you mentioned that the, the cold start time is very low. Um, how, how is the code being run? Are people getting their own process? It sounds like it's not, uh, using containers. I wonder if you could explain a little bit about how that works. [00:57:20] Luca: Yeah, yeah, I can, I can give a high level overview of how it works. So, the way it works is that we essentially have a pool of, of Deno processes ready. Well, it's not quite Deno processes, it's not the same Deno CLI that you download. It's like a modified version of the Deno CLI based on the same infrastructure, that we have spun up across all of our different regions across the world, uh, across all of our different data centers. And then when we get a request, we'll route that request, um, the first time we get request for that, that we call them deployments, that like code, right? We'll take one of these idle Deno processes and will assign that code to run in that process, and then that process can go serve the requests. and these process, they're, they're, they're isolated and they're, you. it's essentially a V8 isolate. Um, and it's a very, very slim, it's like, it's a much, much, much slimmer version of the Deno cli essentially. Uh, which the only thing it can do is JavaScript execution and like, it can't even execute type script, for example, like type script is we pre-process it up front to make the the cold start faster. and then what we do is if you don't get a request for some amount of. , we'll, uh, spin down that, um, that isolate and, uh, we'll spin up a new idle one in its place. And then, um, if you get another request, I don't know, an hour later for that same deployment, we'll assign it to a new isolate. And yeah, that's a cold start, right? Uh, if you have an isolate which receives, or a, a deployment rather, which receives a Bunch of traffic, like let's say you receive a hundred requests per second, we can send a Bunch of that traffic to the same isolate. Um, and we'll make sure that if, that one isolate isn't able to handle that load, we'll spin it out over multiple isolates and we'll, we'll sort of load balance for you. Um, and we'll make sure to always send to the, to the point of present that's closest to, to the user making the request. So they get very minimal latency. and they get we, we've these like layers of load balancing in place and, and, and. I'm glossing over a Bunch of like security related things here about how these, these processes are actually isolated and how we monitor to ensure that you don't break out of these processes. And for example, Deno Deploy does, it looks like you have a file system cuz you can read files from the file system. But in reality, Deno Deploy does not have a file system. Like the file system is a global virtual file system. which is, is, uh, yeah, implemented completely differently than it is in Deno cli. But as an end user you don't have to care about that because the only thing you care about is that it has the exact same API as the Deno cli and you can run your code locally and if it works there, it's also gonna work in deploy. yeah, so that's, that's, that's kind of. High level of Deno Deploy. If, if any of this sounds interesting to anyone, by the way, uh, we're like very actively hiring on, on Deno Deploy. I happen to be the, the tech lead for, for a Deno Deploy product. So I'm, I'm always looking for engineers, to, to join our ranks and, and build cool distributed systems. Deno.com/jobs. [01:00:15] Jeremy: for people who aren't familiar with the isolates, are these each run in their own processes, or do you have a single process and that has a whole Bunch of isolates inside it? [01:00:28] Luca: in, in the general case, you can say that we run, uh, one isolate per process. but there's many asterisks on that. Um, because, it's, it's very complicated. I'll just say it's very complicated. Uh, in, in the general case though, it's, it's one isolate per process. Yeah. Configuring permissions [01:00:45] Jeremy: And then you touched a little bit on the permissions system. Like you gave the example of somebody could have a website where they let their users give them code to execute. how does it look in terms of specifying what permissions people have? Like, is that a configuration file? Are those flags you pass in? What, what does that look? [01:01:08] Luca: Yeah. So, so that product is called sub hosting. It's, um, slightly different from our end user platform. Um, it's essentially a service that allows you to, like, you email us, well, we'll send you a, um, onboard you, and then what you can do is you can send HTTP requests to a certain end point with a, authentication token and. a reference to some code to execute. And then what we'll do is, we'll, um, when we receive that HTTP request, we'll fetch the code, it's spin up and isolate, execute the code. execute the code. We serve the request, return you the response, um, and then we'll pipe logs to you and, and stuff like that. and the, and, and part of that is also when we, when we pull the, um, the, the code for to spin up the isolate, that code doesn't just include the code that we're executing, but also includes things like permissions, and, and various other, we call this isolate configuration. Um, you can inspect, this is all public. we have public docs for this at Deno.com/subhosting. I think. Yes, Deno.com/subhosting. [01:02:08] Jeremy: And is that built on top of something that's a part of the public Deno project, the open source part? Or is this specific to this sub hosting
Software Engineering Radio - The Podcast for Professional Software Developers
J.R. Storment and Mike Fuller discuss cloud financial operations (FinOps) with host Akshay Manchale. They consider the importance of a financial operations strategy for cloud-based infrastructure. J.R. and Mike discuss the differences between operating your own data center and running in the cloud, as well as the problems that doing so creates in understanding and forecasting cloud spend. Mike details the Cloud FinOps lifecycle by first attributing organizational cloud spend through showbacks and chargebacks to individual teams and products. JR describes the two levers available for optimization once an organization understands where they're spending their cloud budget. They discuss complexities that arise from virtualized infrastructure and techniques to attribute cloud usage to the correct owners, and close with some recommendations for engineering leaders who are getting started on cloud FinOps strategy.
Xe Iaso is the Archmage of Infrastructure at Tailscale and previously worked at Heroku.This episode originally aired on Software Engineering Radio but includes some additional discussion about their blog near the end of the episode.Topics covered: Use cases for VPNs Simplifying service authentication by identifying users via IP Peer-to-peer vs centralized "Virtual Pain Networks" Tailscale's tech stack and why they forked the go compiler DERP relay servers Struggling with the iOS network extension size limit The surprisingly small amount of infrastructure required to run a VPN Running your company on your own product Working at Heroku vs Tailscale Using the socratic style of debate in technical blog posts Related Links @theprincessxena Xe's Blog ACL samples Go links origin story How Tailscale works Tailscale SSH How Tailscale assigns IP addresses Hey linker, can you spare a meg? My Blog is Hilariously Overengineered to the Point People Think it's a Static Site The Sheer Terror of PAM Transcript[00:00:00] Jeremy: Today I'm talking to Xe Iaso, they're the archmage of infrastructure at tailscale, and they also have a great blog everyone should check out. Xe, welcome to software engineering radio.[00:00:12] Xe: Thanks. It's great to be here. [00:00:14] Jeremy: I think the first thing we should start with, is what's a, a VPN, because I think some people they may have used it to remote into their workplace or something like that. But I think the, the scope of what it's good for and what it does is a lot broader than that. So maybe you could talk a little bit about that first.[00:00:31] Xe: Okay. a VPN is short for virtual private network. It's basically a fake network that's overlaid on top of existing networks. And then you can use that network to do whatever you would with a normal computer network. this term has been co-opted by companies that are attempting to get into the, like hide my ass style market, where, you know, you encrypt your internet information and keep it safe from hackers.But, uh, so it makes it really annoying and hard to talk about what a VPN actually is. Because tailscale, uh, the company I work for is closer to like the actual intent of a VPN and not just, you know, like hide your internet traffic. That's already encrypted anyway with another level of encryption and just make a great access point for, uh, three letter agencies.But are there, use cases, past that, like when you're developing a piece of software, why would you decide to use a VPN outside of just because I want my, you know, my workers to be able to get access to this stuff.[00:01:42] Xe: So something that's come up, uh, when I've been working at tailscale is that sometimes we'll make changes to something. And it'll be changes to like the user experience of something on the admin panel or something. So in a lot of other places I've worked in order to have other people test that, you know, you'd have to push it to the cloud.It would have to spin up a review app in Heroku or some terrifying terraform of abomination would have to put it out onto like an actual cluster or something. But with tail scale, you know, if your app is running locally, you just give like the name of your computer and the port number. And you know, other people are able to just see it and poke it and experience it.And that basically turns the, uh, feedback cycle from, you know, like having to wait for like the state of the world to converge, to, you know, make a change, press F five, give the URL to a coworker and be like, Hey, is this Gucci?they can connect to your app as if you were both connected to the same switch.[00:02:52] Jeremy: You don't have to worry about, pushing to a cloud service or opening ports, things like that.[00:02:57] Xe: Yep. It will act like it's in the same room, even when they're not it'll even work. if you're at both at Starbucks and the Starbucks has reasonable policies, like holy crap, don't allow devices to connect to each other directly. so you know, you're working on. Like your screenplay app at your Starbucks or something, and you have a coworker there and you're like, Hey, uh, check this out and, uh, give them the link.And then, you know, they're also seeing the screenplay editor.[00:03:27] Jeremy: in terms of security and things like that. I mean, I'm picturing it kind of like we were sitting in the same room and there's a switch and we both plugged in. Normally when you do something like that, you kind of have, full access to whatever else is on the switch. Uh, you know, provided that's not being blocked by a, a firewall.is there like a layer of security on top of that, that a VPN service like tailscale would provide.[00:03:53] Xe: Yes. Um, there are these things called access control lists, which are kind of like firewall rules, except you don't have to deal with like the nightmare of writing an IP tables rule that also works in windows firewall and whatever they use in Mac OS. The ACL rules are applied at the tailnet level for every device in the tailnet.So if you have like developer machines, you can put people into groups as things like developers and say that developer machines can talk to production, but not people in QA. They can only talk to testing and people on SRE have, you know, permissions to go everywhere and people within their own teams can connect to each other. you can make more complicated policies like that fairly easily.[00:04:44] Jeremy: And when we think about infrastructure for, for companies, you were talking about how there could be development, infrastructure, production, infrastructure, and you kind of separate it all out. when you're working with cloud infrastructure. A lot of times, there's the, I always forget what it stands for, but there's like IAM.There's like policies that you can set up with the cloud provider that says these users can access this, or these machines can access this. And, and I wonder from your perspective, when you would choose to use that versus use something at the, the network or the, the VPN level.[00:05:20] Xe: The way I think about it is that things like IAM enforce, permissions for like more granularly scoped things like can create EC2 instances or can delete EC2 instances or something like that. And that's just kind of a different level of thing. uh, tailscale, ACLs are more, you know, X is allowed to connect to Y or with tailscale, SSH X is allowed to connect as user Y.and that's really different than like arbitrary capability things like IAM offers.you could think about it as an IAM system, but the main permissions that it's exposing are can X connect to Y on Zed port.[00:06:05] Jeremy: What are some other use cases where if you weren't using a VPN, you'd have to do a lot more work or there's a lot more complexity, kind of what are some cases where it's like, okay, using a VPN here makes a lot of sense.(The quick and simple guide to go links https://www.trot.to/go-links) [00:06:18] Xe: There is a service internal to tailscale called go, which is a, clone of Google's so-called go links where it's basically a URL shortener that lives at http://go. And, you know, you have go/something to get to some internal admin service or another thing to get to like, you know, the company directory and notion or something, and this kind of thing you could do with a normal setup, you know, you could set it up and have to do OAuth challenges everywhere and, you know, have to put and make sure that everyone has the right DNS configuration so that, it shows up in the right place.And then you have to deal with HTTPS um, because OAuth requires HTTPS for understandable and kind of important reasons. And it's just a mess. Like there's so many layers of stuff like the, the barrier to get, you know, like just a darn URL, shortener up turns from 20 minutes into three days of effort trying to, you know, understand how these various arcane things work together.You need to have state for your OAuth implementation. You need to worry about what the hell a a JWT is (sigh) . It's it it's just bad. And I really think that something like tailscale with everybody has an IP address. In order to get into the network, you have to sign in with your, auth provider, your, a provider tells tailscale who you are.So transitively every IP address is tied to an owner, which means that you can enforce access permission based on the IP address and the metadata about it that you grab from the tailscale. daemon, it's just so much simpler. Like you don't have to think about, oh, how do I set up OAuth this time? What the hell is an oauth proxy?Um, what is a Kubernetes? That sort of thing you just think about like doing the thing and you just do it. And then everything else gets taken care of it. It's like kind of the ultimate network infrastructure, because it's both omnipresent and something you don't have to think about. And I think that's really the power of tailscale.[00:08:39] Jeremy: typically when you would spin up a, a service that you want your developers or your system admins, to be able to log into, you would have to have some way of authenticating and authorizing that user. And so you were talking about bringing in OAuth and having your, your service understand that.But I, I guess what you're saying is that when you have something like tailscale, that's kind of front loaded, I guess you, you authenticate with tail scale, you get onto the network, you get your IP. And then from that point on you can access all these different services that know like, Hey, because you're on the network, we know you're authenticated and those services can just maybe map that IP that's not gonna change to like users in some kind of table. Um, and not have to worry about figuring out how do I authenticate this user.[00:09:34] Xe: I would personally more suggest that you use the, uh, whois, uh, look up route in the tailscale daemon's local API, but basically, yeah, you don't really have to worry too much about like the authentication layer because the authentication layer has already been done. You know, you've already done your two factor with Gmail or whatever, and then you can just transitively push that property onto your other machines.[00:10:01] Jeremy: So when you talk about this, this whois daemon, can you give an example of I'm in the network now I'm gonna make a service call to an application. what, what am I doing with this? This whois daemon?[00:10:14] Xe: It's more of like a internal API call that we expose via tailscaled's, uh, Unix, socket. but basically you give it an IP address and a port, and it tells you who the person is. It's kind of like the Unix ident protocol in a way, except completely not. And at a high level, you know, if you have something like a proxy for Grafana, you have that proxy for Grafana, make a call to the local tailscale daemon, and be like, Hey, who was this person?And the tailscale, daemon will spit back at JSON object. Like, oh, it's this person on this device and there you can do additional logic like maybe you shouldn't be allowed to delete things from an iOS device, you know, crazy ideas like that. there's not really support for like arbitrary capabilities and tailscaled at the time of recording, but we've had some thoughts would be cool.[00:11:17] Jeremy: would that also include things like having roles, for example, even if it's just strings, um, that you get back so that your application would know, okay. This person, is supposed to have admin access to this service based on what I got back from, this, this service.[00:11:35] Xe: Not currently, uh, you can probably do it via convention or something, but what's currently implemented in the actual, like, source code and user experience that they, you can't do that right now. Um, it is something that I've been, trying to think about different ways to solve, but it's also a problem.That's a bit big for me personally, to tackle.[00:11:59] Jeremy: there's, there's so many, I guess, different ways of doing it. That it's kind of interesting to think of a solution that's kind of built into the, the network. Yeah.[00:12:10] Xe: Yeah. and when I describe that authentication thing to some people, it makes them recoil in shock because there's kind of a Stockholm syndrome type effect with security, for a lot of things where, the easy way to do something and the secure way to do something are, you know, like completely opposite and directly conflicting with each other in almost every way.And over time, people have come to associate security or like corporate VPNs as annoying, complicated, and difficult. And the idea of something that isn't annoying, complicated or difficult will make people reject it, like just on principle, because you know, they've been trained that, you know, VPN equals virtual pain network and it, it's hard to get that association outta people's heads because you know, a lot of VPNs are virtual pain networks.Like. I used to work for Salesforce and Salesforce had this corporate VPN where no matter what you did, all of your traffic would go out to the internet from their data center. I think it was in San Francisco or something. And I was in the Seattle area. So whenever I had the VPN on my latency to Google shot up by like eight times and being a software person, you know, I use Google the same way that others breathe and it, it was just not fun.And I only had the VPN on for the bare minimum of when I needed it. And, oh God, it was so bad.[00:13:50] Jeremy: like some people, when they picture a VPN, they picture exactly what you're describing, where all of my traffic is gonna get routed to some central point. It's gonna go connect to the thing for me and then send the result back. so maybe you could talk a little bit about why that's, that's maybe a wrong assumption, I guess, in the case of tailscale, or maybe in the case of just more modern VPN solutions.[00:14:13] Xe: Yeah. So the thing that I was describing is what I've been lovingly calling the, uh, single point of failure as a service type model of VPN, where, you know, you have like the big server somewhere, it concentrates all the connections and, you know, like does things to make the computer feel like they've teleported over there, but overall it's a single point of failure.And if that falls over, you know, like goodbye, VPN. everybody's just totally screwed. And in contrast, tailscale does a more peer-to-peer thing so that everyone is basically on equal footing. Everyone can send traffic directly to each other, and if it can't get directly to there, it'll use a network of, uh, relay servers, uh, lovingly called Derp and you don't have to worry about, your single point of failure in your cluster, because there's just no single point of failure.Everything will directly communicate as much as possible. And if it can't, it'll still communicate anyway.[00:15:18] Jeremy: let's say I start up my computer and I wanna connect to a server in a data center somewhere at the very beginning, am I connecting to some server hosted at tailscale? And then. There's some kind of negotiation process where after that I connect directly or do I just connect directly straight away?[00:15:39] Xe: If you just turn on your laptop and log in, you know, to it signs into tailscale and gets you on the tailnet and whatnot, then it will actually start all connections via Derp just so that it can negotiate the, uh, direct connection. And in case it can't, you know, it's already connected via Derp so it just continues the connection with Derp and this creates a kind of seamless magic type experience where doing things over Derp is slower.Yes, it is measurably slower because you know, like you're not going directly, you're doing TCP inside of TCP. And you know, that comes with a average minefield of lasers or whatever you call it. And it does work though. It's not ideal if you wanna do things like copy large amounts of data, but if you want just want ssh into prod and see the logs for what the heck is going on and why you're getting paged at 3:00 AM. it's pretty great.[00:16:40] Jeremy: What you, you were calling Derp is it where you have servers kind of all over the world and somehow it determines which one's, I guess, is it which one's closest to your destination or which one's closest to you. I'm kind of[00:16:54] Xe: It's really interesting. It's one of the most weird distributed systems, uh, type things that I've ever seen. It's the kind of thing that could only come outta the mind of an X Googler, but basically every tailscale, every tailscale node has a connection to all of the Derp servers and through process of, you know, latency testing.It figures out which connection is the fastest and the lowest latency. And it calls that it's home Derp but because it's connected to everything is connected to every Derp you can have two people with different home Derps getting their packets relayed too other clients from different Derps.So, you know, if you have a laptop in Ottawa and a laptop in San Francisco, the laptop in San Francisco will probably use the, uh, Derp that's closest to it. But the laptop in Ottawa will also use the Derp that's closest to it. So you get this sort of like asynchronous thing, and it actually works out a lot better in practice, than you're probably imagining.[00:17:52] Jeremy: And then these servers, what was the, the technical term for them? Are they like relays or what's[00:17:58] Xe: They're relays. Uh, they only really deal with encrypted wire guard packets, and there's, no way for us at tailscale, to see the contents of Derp messages, it is literally just a forwarder. It, it literally just forwards things based on the key ID.[00:18:17] Jeremy: I guess if tail scale isn't able to decrypt the traffic, is, is that because the, the keys are only on the user's devices, like it's on their laptop and on the server they're trying to reach, or[00:18:31] Xe: Yeah. The private keys are live and die with those devices or the devices they were minted on. And the public keys are given to the coordination server and the coordination server spreads those around to every device in your tailnet. It does some limiting so that like, if you don't have ACL access to something, you don't get the private key, you don't get the, uh, public key for it.The public key, not the private key, the public key, not the private key. And yeah. Then, you know, you just go that way and it'll just figure it out. It's pretty nice.[00:19:03] Jeremy: When we're kind of talking about situations where it can't connect directly, that's where you would use the relay. what are kind of the typical cases where that happens, where you, you aren't able to just connect directly?[00:19:17] Xe: Hotel, wifi and paranoid network security setups, hotel wifi is the most notorious one because you know, you have like an overpriced wifi connection. And if you bring, like, I don't know like, You you're recording a bunch of footage on your iPhone. And because in, 2022. The iPhone has the USB2 connection on it.And you know, you wanna copy that. You wanna use the network, but you can't. So you could just let it upload through iCloud or something, or, you know, do the bare minimum. You need to get the, to get the data off with Derp it wouldn't be ideal, but it would work. And ironically enough, that entire complexity involved with, you know, doing TCP inside of TCP to copy a video file over to your laptop might actually be faster than USB2, which is something that I did the math for a while ago.And I just started laughing.[00:20:21] Jeremy: Yeah, that that is pretty, pretty ridiculous [00:20:23] Xe: welcome to the future, man (laughs) .[00:20:27] Jeremy: in terms of connecting directly, usually when you have a computer on the internet, you don't have all your ports open, you don't necessarily allow, just anybody to send you traffic over UDP and so forth. let's say I wanna send, UDP data to a, a server on my network, but, you know, maybe it has some TCP ports open. I I'm assuming once I connect into the network via the VPN, I'm able to use other protocols and ports that weren't necessarily exposed. Is that correct?[00:21:01] Xe: Yeah, you can use UDP. you can do basically anything you would do on a normal network except multicast um, because multicast is weird.I mean, there's thoughts on how to handle multicast, but the main problem is that like wireguard, which is what is tail tailscale is built on top of, is, so called OSI model layer three network, where it's at like, you know, the IP address level and multicast is a layer two or data link layer type thing.And, those are different numbers and, you can't really easily put, you know, like broadcast packets into IP, uh, IPV4 thinks otherwise, but, uh, in practice, no people don't actually use the broadcast address.[00:21:48] Jeremy: so for someone who's, they, they have a project or their company wants to get started. I mean, what does onboarding look like? What, what do they have to do to get all these devices talking to one another?[00:22:02] Xe: basically you, install tail scale, you log in with a little GUI thing or on a Linux server, you run tailscale up, and then you all log to the, to a, like a G suite account with the same domain name. So, you know, if your domain is like example.com, then everybody logs in with their example.com G suite account.And, there is no step three, everything is allowed and everything can just connect and you can change the permissions from there. By default, the ACLs are set to a, you know, very permissive allow everyone to talk to everyone on any port. Uh, just so that people can verify that it's working, you know, you can ping to your heart's content.You can play Minecraft with others. You can, you know, host an HTTP server. You can SSH into your development box and and write blog post with emacs, whatever you want.[00:22:58] Jeremy: okay, you install the, the software on your servers, your workstations, your laptops, and so on. And then at, after that there's some kind of webpage or dashboard you would go in and say, I want these people to be able to access these things and [00:23:14] Xe: Mm-hmm [00:23:15] Jeremy: these ports and so on.[00:23:17] Xe: you, uh, can customize the access control rules with something that looks like JSON, but with trailing commas and comments allowed, and you can go from there to customize basically anything to your heart's content. you can set rules so that people on the DevOps team can access everything, but you know, maybe marketing doesn't need access to the production database.So you don't have to worry about that as much.[00:23:45] Jeremy: there's, there's kind of different options for VPNs. CloudFlare access, zero tier, there's, there's some kind of, I think it's Nebula from slack or something like that. so I was kind of curious from your perspective, what's the, difference between those kinds of services and, and tailscale.[00:24:04] Xe: I'm gonna lead this out by saying that I don't totally understand the differences between a lot of them, because I've only really worked with tailscale. I know things about the other options, but, uh, I have the most experience with tailscale but from what I've been able to tell, there are things that tailscale offers that others don't like reverse mapping of IP addresses to people, or, there's this other feature that we've been working on, where you can embed tail scale as a library inside your go application, and then write a internal admin service that isn't exposed to the internet, but it's only exposed over tailscale.And I haven't seen a way to do those things with those others, but again, I haven't done much research. Um, I understand that zero tier has some layer, two capabilities, but I've, I don't have enough time in the day to look into.[00:25:01] Jeremy: There's been different, I guess you would call them VPN protocols. I mean, there's people have probably worked with IP sec in some situations they may have heard of OpenVPN, wireguard. in the case of tailscale, I believe you chose to build it on top of wireguard.So I wonder if you could talk a little bit about why, you chose wireguard and, and maybe what makes it unique.[00:25:27] Xe: I wasn't on the team that initially wrote like the core of tailscale itself. But from what I understand, wire guard was chosen because, what overhead, uh, it's literally, you just encrypt the packets, you send it to the other server, the other server decrypts them. And you know, you're done. it's also based purely on the public key. Um, the key pairs involved. And from what I understand, like at the wireguard protocol level, there's no reason why you, why you would need an IP address at all in theory, but in practice, you kind of need an IP address because you know, everything sucks. But also wire guard is like UDP only, which I think it at it's like core implementation, which is a step up from like AnyConnect and OpenVPN where they have TCP modes.So you can experience the, uh, glorious, trash fire of TCP in TCP. And from what I understand with wireguard, you don't need to set up a certificate authority or figure out how the heck to revoke certificates. Uh, you just have key pairs and if a node needs to be removed, you delete the key pair and you're done.And I think that really matches up with a lot of the philosophy behind how tailscale networks work a lot better. You know, you have a list of keys and if the network changes the list of keys changes, that's, that's the end of the story.So maybe one of the big selling points was just What has the least amount of things I guess, to deal with, or what's the, the simplest, when you're using a component that you want to put into your own product, you kind of want the least amount of things that could go wrong, I guess.[00:27:14] Xe: Yeah. It's more like simple, but not like limiting. Like, for example, a set of tinker toys is simple in that, you know, you can build things that you don't have to worry too much about the material science, but a set of tinker toys is also limiting because you know, like they're little wooden, dowels and little circles made out of wind that you stick the dowels into, you know, you can only do so much with it.And I think that in comparison, wireguard is simple. You know, there's just key pairs. They're just encryption. And it's simple in it's like overall theory and it's implementation, but it's not limiting. Like you can do pretty much anything you want with it.inherently whenever we build something, that's what we want, but that's a, that's an interesting way of putting it. Yeah.[00:28:05] Xe: Yeah. It. It can be kind of annoyingly hard to figure out how to make things as simple as they need to be, but still allow for complexity to occur. So you don't have to like set up a keyboard macro to write if error not equals nil over and over.[00:28:21] Jeremy: I guess the next thing I'd like to talk a little bit about is. We we've covered it a little bit, but at a high level, I understand that that tailscale uses wireguard, which is the open source, VPN protocol, I guess you could call it. And then there's the client software. You're saying you need to install on each of the servers and workstations.But there's also a, a control plane. and I wonder if you could kind of talk a little bit about I guess at a high level, what are all the different components of, of tailscale?[00:28:54] Xe: There's the agent that you install in your devices. The agent is basically the same between all the devices. It's all written in go, and it turns out that go can actually cross compile fairly well. So you have. Your, you know, your implementation in go, that is basically the, the same code, more or less running on windows, MacOS, freeBSD, Android, ChromeOS, iOS, Linux.I think I just listed all the platforms. I'm not sure, but you have that. And then there's the sort of control plane on tailscale's side, the control plane is basically like control, uh, which is, uh, I think a get smart reference. and that is basically a key dropbox. So, you know, you You authenticate through there. That's where the admin panel's hosted. And that's what tells the different tailscale nodes uh, the keys of all the other machines on the tailnet. And also on tailscale side there's, uh, Derp which is a fleet of a bunch of different VPSs in various clouds, all over the world, both to try to minimize cost and to, uh, have resiliency because if both digital ocean and Vultr go down globally, we probably have bigger problems.[00:30:15] Jeremy: I believe you mentioned that the, the clients were written in go, are the control plane and the relay, the Derp portion. Are those also written in go or are they[00:30:27] Xe: They're all written and go, yeah,go as much as possible. Yeah.It's kind of what happens when you have some ex go team members is the core people involved in tail scale, like. There's a go compiler fork that has some additional patches that go upstream either can't accept, uh, won't accept or hasn't yet accepted, for a while. It was how we did things like trying to shave off by bites from binary size to attempt to fit it into the iOS network extension limit.Because for some reason they only allowed you to have 15 megabytes of Ram for both like your application and working Ram. And it turns out that 15 megabytes of Ram is way more than enough to do something like OpenVPN. But you know, when you have a peer-to-peer VPN engine, it doesn't really work that well.So, you know, that's a lot of interesting engineering challenge.[00:31:28] Jeremy: That was specifically for iOS. So to run it on an iPhone.[00:31:32] Xe: Yeah. Um, and amazingly after the person who did all of the optimization to the linker, trying to get the binary size down as much as possible, like replacing Unicode packages was something that's more coefficient, you know, like basically all but compressing parts of the binary to try to save space. Then the iOS, I think 15 beta dropped and we found out that they increased the network extension Ram limit to 50 megabytes and the look of defeat on that poor person's face. I feel very bad for him.[00:32:09] Jeremy: you got what you wanted, but you're sad about it,[00:32:12] Xe: Yeah.[00:32:14] Jeremy: so that's interesting too. you were using a fork of the go compiler [00:32:19] Xe: Basically everything that is built is built using, uh, the tailscale fork, of the go compiler.[00:32:27] Jeremy: Going forward is the sort of assumption is that's what you'll do, or is it you're, you're hoping you can get this stuff upstreamed and then eventually move off of it.[00:32:36] Xe: I'm pretty sure that, I, I don't know if I can really make a forward looking statement like that, but, I've come to accept the fact that there's a fork of the go compiler. And as a result, it allows a lot more experimentation and a bit more of control, a bit more control over what's going on. like I'm, I'm not like the most happy with it, but I've, I understand why it exists and I'm, I've made my peace with it.[00:33:07] Jeremy: And I suppose it, it helps somewhat that the people who are working on it actually originally worked on the, go compiler at Google. Is that right?[00:33:16] Xe: Oh yeah. If, uh, there weren't ex go team people working on that, then I would definitely feel way less comfortable about it. But I trust that the people that are working on it, know what they're doing at least enough.[00:33:30] Jeremy: I, I feel like, that's, that's kind of the position we put ourselves in with software in general, right? Is like, do we trust our ourselves enough to do this thing we're doing?[00:33:39] Xe: Yeah. And trust is a bitch.[00:33:44] Jeremy: um, I think one of the things that's interesting about tail scale is that it's a product that's kind of it's like network infrastructure, right? It's to connect you to your other devices. And that's a little different than somebody running a software as a service. And so. how do you test something that's like built to support a network and, and how is that different than just making a web app or something like that.[00:34:11] Xe: Um, well, it's a lot more complicated for one, especially when you have to have multiple devices in the mix with multiple different operating systems. And I was working on some integration tests, doing stuff for a while, and it was really complicated. You have to spin up virtual machines, you know, you have to like make sure the virtual machines are attempting to download the version of the tailscale client you wanna test and. It's it's quite a lot in practice.[00:34:42] Jeremy: I mean, do you have a, a lab, you know, with Android phones and iPhones and laptops and all this sort of stuff, and you have some kind of automated test suite to see like, Hey, if these machines are in Ottawa and, my servers in San Francisco, like you're mentioning before that I can get from my iPhone to this server and the data center over here, that kind of thing.[00:35:06] Xe: What's the right way to phrase this without making things look bad. Um, it's a work in progress. It it's, it's really a hard problem to solve, uh, especially when the company is fully remote and, uh, like. Address that's listed on the business records is literally one of the founders condos because you know, the company has no office.So that makes the logistics for a lot of this. Even more fun.[00:35:37] Jeremy: Probably any company that's in an early stage feels the same way where it's like, everything's a work in progress and we're just gonna, we're gonna keep going and we're gonna get there. And as long as everything keeps running, we're good.[00:35:50] Xe: Yeah. I, I don't like thinking about it in that way, because it kind of sounds like pessimistic or defeatist, but at some level it's, it, it really is a work in progress because it's, it's a hard problem and hard problems take a lot of time to solve, especially if you want a solution that you're happy with.[00:36:10] Jeremy: And, and I think it's kind of a unique case too, where it's not like if it goes down, it's like people can't do their job. Right. So it's yeah.[00:36:21] Xe: Actually, if tail scales like control plane goes down, I don't think people would notice until they tried to like boot up a, a reboot, a laptop, or connect a new device to their tailnet. Because once, once all the tailscale agents have all of the information they need from the control plate, you know, they just, they just continue on independently and don't have to care.Derp is also fairly independent of the, like the key dropbox component. And, you know, if that, if that goes down Derp doesn't care at all,[00:37:00] Jeremy: Oh, okay. So if the control plane is down, as long as you had authenticated earlier in the day, you can still, I don't know if it's cached or something, but you can still continue to reach the relay servers, the Derp servers or your, [00:37:15] Xe: other nodes. Yeah. I, I'm pretty sure that in most cases, the control plane could be down for several hours a day and nobody would notice unless they're trying to deal with the admin panel.[00:37:28] Jeremy: Got it. that's a little bit of a relief, I suppose, for, for all of you running it,[00:37:33] Xe: Yeah. Um, it's also kind of hard to sell people on the idea of here is a VPN thing. You don't need to self host it and they're like, what? Why? And yeah, it can be fun.[00:37:49] Jeremy: though, I mean, I feel like anybody who has, self-hosted a VPN, they probably like don't really wanna do it. I don't know. Maybe I'm wrong.[00:38:00] Xe: well, so a lot of the idea of wanting to self host it is, uh, I think it's more of like trying to be self-sufficient and not have to rely on other companies, failures dictating your company's downtime. And, you know, like from some level that's very understandable. And, you know, if, you know, like tail scale were to get bought out and the new owners would, you know, like basically kill the product, they'd still have something that would work for them.I don't know if like such a defeatist attitude is like productive. But it is certainly the opinion that I have received when I have asked people why they wanna self-host. other people, don't want to deal with identity providers or the, like, they wanna just use their, they wanna use their own identity provider.And what was hilarious was there was one, there was one thing where they were like our old VPN server died once and we got locked out of our network. So therefore we wanna, we wanna self-host tailscale in the future so that this won't happen again.And I'm like, buddy, let's, let's just, let's just take a moment and retrace our steps here. CAuse I don't think you mean what you think you mean.[00:39:17] Jeremy: yeah, yeah. [00:39:19] Xe: In general, like I suggest people that, you know, even if they're like way deep into the tailscale, Kool-Aid they still have at least one other method of getting into their servers. Ideally, two. I, I admit that I'm, I come from an SRE style background and I am way more paranoid than most, but it, I usually like having, uh, a backup just in case.[00:39:44] Jeremy: So I, I suppose, on, on that note, let's, let's talk a little bit about your role at tailscale. the title of the archmage of infrastructure is one of the, the coolest titles I've, uh, I've seen. So maybe you can go a little bit into what that entails at, at tailscale.[00:40:02] Xe: I started that title as a joke that kind of stuck, uh, my intent, my initial intent was that every time someone asked, I'd say, I'd have a different, you know, like mystic sounding title, but, uh, archmage of infrastructure kind of stuck. And since then, I've actually been pivoting more into developer relations stuff rather than pure software engineering.And, from the feedback that I've gotten at the various conferences I've spoken at, they like that title, even though it doesn't really fit with developer relations work at all, it it's like it fits because it doesn't. You know, that kind of coney kind of way.[00:40:40] Jeremy: I guess this would go more into the, the infrastructure side, but. What does the, the scale of your infrastructure look like? I mean, I, I think that you touched a little bit on the fact that you have relay servers all over the place and you've got this control plane, but I wonder if you could give people a little bit of perspective of what kind of undertaking this is.[00:41:04] Xe: I am pretty sure at this point we have more developer laptops and the like, than we do production servers. Um, I'm pretty sure that the scale of the production of production servers are in the tens, at most. Um, it turns out that computers are pretty darn and efficient and, uh, you don't really need like a lot of computers to do something amazing.[00:41:27] Jeremy: the part that I guess surprises me a little bit is, is the relay servers, I suppose, because, I would imagine there's a lot of traffic that goes through those. are you finding that just most of the time they just aren't needed and usually you can make a direct connection and that's why you don't need too many of these.[00:41:45] Xe: From what I understand. I don't know if we actually have a way to tell, like what percentage of data is going over the relays versus not. And I think that was an intentional decision, um, that may have been revisited I'm operating based off of like six to 12 month old information right now. But in general, like the only state that the relay servers has is in Ram.And whenever the relay, whenever you disconnect the server, the state is dropped.[00:42:18] Jeremy: Okay.[00:42:19] Xe: and even then that state is like, you know, this key is listening. It is, uh, connected, uh, in case you wanna send packets over here, I guess. it's a bit less bandwidth than you're probably thinking it's not like enough to max it out 24/7, but it is, you know, measurable and there are some, you know, costs associated with it. This is also why it's on digital ocean and vulture and not AWS. but in general, it's a lot less than you'd think. I'm pretty sure that like, if I had to give a baseless assumption, I'd say that probably about like 85% of traffic goes directly.And the remaining is like the few cases in the whole punching engine that we haven't figured out yet. Like Palo Alto fire walls. Oh God. Those things are a nightmare.[00:43:13] Jeremy: I see. So it's most of the traffic actually ends up. Being straight peer to peer. Doesn't have to go through your infrastructure. And, and therefore it's like, you don't need too many machines, uh, to, to make this whole thing work.[00:43:28] Xe: Yeah. it turns out that computers are pretty darn fast and that copying data is something that computers are really good at doing. Um, so if you have, you know, some pretty darn fast computers, basically just sitting there and copying data back and forth all day, like it, you can do a lot with shockingly little.Um, when I first started, I believe that the Derp VMs were using like sometimes as little as one core and 512 megabytes of Ram as like a primary Derp. And, you know, we only noticed when, there were some weird connection issues for people that were only on Derp because there were enough users that the machine had ran out of memory.So we just, you know, upped the, uh, virtual machine size and called it a day. But it's, it's truly remarkable how mu how far you can get with very little[00:44:23] Jeremy: And you mentioned the relay servers, the, the Derp servers were on services like digital ocean and Vultr. I'm assuming because of the, the bandwidth cost, for the control plane, is, is that on AWS or some other big cloud provider?[00:44:39] Xe: it's on AWS. I believe it's in EU central 1.[00:44:44] Jeremy: You're helping people connect from device to device and in a situation like that. what does monitoring look like in, in incidents? Like what are you looking for to determine like, Hey, something's not working.[00:44:59] Xe: there's monitoring with, you know, Prometheus, Grafana, all of that stuff. there are some external probing things. there's also some continuous functional testing for trying to connect to tailscale and like log in as an account. And if that fails like twice in a row, then, you know, something's very wrong and, you know, raise the alarm.But in general. A lot of our monitoring is kind of hard at some level because you know, we're tailscale at a tailscale can't always benefit from tailscale to help operate tail scale because you know, it's tailscale. Um, so it, it still trying to figure out how to detangle the chicken and egg situation.It's really annoying.there's the, the term dog fooding, right? Where they're saying like, oh, we, we run, um, our own development on our own platform or our own software. but I could see when your product is network infrastructure, VPNs, where that could be a little, little dicey.[00:46:06] Xe: Yeah, it is very annoying. But I I'm pretty sure we'll figure something out. It is just a matter of when, another thing that's come up is we've kind of wanted to use tailscale's SSH features, where you specify ACLs in your, you specify ACL rules to allow people to SSH, to other nodes as various users.but if that becomes your main access to production, then you know, like if tailscale is down and you're tailscale, like how do you get in, uh, then there's been various philosophical discussions about this. it's also slightly worse if you use what's called check mode in SSH, where, uh, tail scale, SSH without check mode, you know, you just, it, the, the server checks against the policy rules and the ACL and if it. if it's okay, it lets you in. And if not, it says no, but with check mode, there's also this like eight hour, there's this like eight hour quote unquote lifetime for you to have like sudo mode on GitHub, where you do an auth an auth challenge with your auth aprovider. And then, you know, you're given a, uh, Hey, this person has done this thing type verification.And if that's down and that goes through the control plane, and if the control plane is down and you're tailscale, trying to debug the control plane, and in order to get into the control plane over tailscale, you need to use the, uh, control plane. It, you know, that's like chicken and egg problem level 78,which is a mythical level of chicken egg problem that, uh, has only been foretold in the legends of yore or something.[00:47:52] Jeremy: at that point, it sounds like somebody just needs to, to drive to the data center and plug into the switch.[00:47:59] Xe: I mean, It's not, it's not going to, it probably wouldn't be like, you know, we need to get a person with an angle grinder off of Craigslist type bad. Like it was with the Facebook BGP outage, but it it's definitely a chicken and egg problem in its own right.it makes you do a lot of lateral thinking too, which is also kind of interesting.[00:48:20] Jeremy: When, when you say lateral thinking, I'm just kind of curious, um, if you have an example of what you mean.[00:48:27] Xe: I don't know of any example that isn't NDAed. Um, but basically, you know, tail scale is getting to the, to the point where tailscale is relying on tailscale to make tailscale function and you know, yeah. This is classic oroboros style problem.I've heard a, uh, a wise friend of mine said that that is an ideal problem to have, which sounds weird at face value. But if you're getting to that point, that means that you're successful enough that, you know, you're having that problem, which is in itself a good thing, paradoxically.[00:49:07] Jeremy: better to have that problem than to have nobody care about the product. Right.[00:49:12] Xe: Yeah.[00:49:13] Jeremy: kind of on that, that note, um, you mentioned you worked at, at Salesforce, uh, I believe that was working on Heroku. I wonder if you could talk a little about your experience working at, you know, tailscale, which is kind of more of a, you know, early startup versus, uh, an established company like Salesforce.[00:49:36] Xe: So at the time I was working at Heroku, it definitely didn't feel like I was working at Salesforce for the majority of it. It felt like I was working, you know, at Heroku, like on my resume, I listed as Heroku. When I talked about it to people, I said, I worked at Heroku and that sales force was this, you know, mythical, Ohana thing that I didn't have to deal with unless I absolutely had to.By the end of the time I was working at Heroku, uh, the salesforce, uh, sort of started to creep in and, you know, we moved from tracking issues in GitHub issues. Like we were used to, to using their, oh, what's the polite way to say this, their creation, which is, which was like the moral equivalent of JIRA implemented on top of Salesforce.You had to be behind the VPN for it. And, you know, every ticket had 20 fields and, uh, there were no templates. And in comparison with tail scale, you know, we just use GitHub issues, maybe some like things in notion for doing like longer term tracking or Kanban stuff, but it's nice to not have. you know, all of the pomp and ceremony of filling out 20 fields in a ticket for like two sentences of this thing is obviously wrong and it's causing X to happen.Please fix.[00:51:08] Jeremy: I, I like that, that phrase, the, the creation, that's a very, very diplomatic term.[00:51:14] Xe: I mean, I can think of other ways to describe it, but I'm pretty sure those ways wouldn't be allowed on the podcast. So[00:51:25] Jeremy: Um, but, but yeah, I, I know what you mean for sure where, it, it feels like there's this movement from, Hey, let's just do what we need. Like let's fill in the information that's actually relevant and don't do anything else to a shift to, we need to fill in these 10 fields because that's the thing we do.Yeah.[00:51:48] Xe: Yeah. and in the time I've been working for tail scale, I'm like employee ID 12. And, uh, tail scale has gone from a company where I literally know everyone to just recently to the point where I don't know everyone anymore. And it's a really weird feeling. I've never been in a, like a small stage startup that's gotten to this size before, and I've described some of my feelings to other people who have been there and they're like, yeah, welcome to the club. So I figure a lot of it is normal. from what I understand, though, there's a lot of intentionality to try to prevent tail skill from becoming, you know, like Google style, complexity, organizational complexity, unless that is absolutely necessary to do something.[00:52:36] Jeremy: it's a function of size, right? Like as you have more people, more teams, then more process comes in. that's a really tricky balance to, to grow and still keep that feeling of, I'm just doing the thing, I'm doing the work rather than all this other process stuff.[00:52:57] Xe: Yeah, but it, I've also kind of managed to pigeonhole myself off into a corner with devrel stuff. And that's been nice. I've been working a bunch with, uh, like marketing people and, uh, helping out with support occasionally and doing a, like a godawful amount of writing.[00:53:17] Jeremy: the, the writing, for our audience's benefit, I, I think they should, they should really check out your blog because I think that the way you write your, your articles is very thoughtful in terms of the balance of the actual example code or example scripts and the descriptions and, and some there's a little bit of a narrative sometimes too.So, [00:53:40] Xe: Um, I'm actually more of a prose writer just by like how I naturally write things. And a lot of the style of how I write things is, I will take elements from, uh, the Socratic style of dialogue where, you know, you have the student and the teacher. And, you know, sometimes the student will ask questions that the teacher will answer.And I found that that's a particularly useful way to help model understanding or, you know, like put side concepts off into their own little blurbs or other things like that. I also started doing those conversation things with, uh, furry art, specifically to dunk on a homophobe that was getting very angry at furry art being in, uh, another person's blog.And that's it, it's occasionally fun to go into the, uh, orange website of bad takes and see the comments when people complain about it. oh gosh, the bad takes are hilariously good. Sometimes.[00:54:45] Jeremy: it's good that you have like a, a positive, mindset around that. I know some people can read, uh, that sort of stuff and go, you know, just get really bummed out. [00:54:54] Xe: One of the ways I see it is that a lot of the time algorithms are based on like sheer numbers. So if you like get something that makes people argue in the comments, that number will go up and because there's more comments on it, it makes more people more likely to, to read the article and click on it.So, sometimes I have been known to sprinkle, what's the polite way to say this. I've been known to sprinkle like intentionally kind of things that will, uh, get people and make them want to argue about it in the comments. Purely to make the engagement numbers rise up, which makes more people likely to read the article.And, it's kind of a dirty practice, but you know, it makes more people read the article and more people benefit. So, you know, like it's kind of morally neutral, I guess.[00:55:52] Jeremy: usually that, that seems like, a sketchy thing. But I feel like if it's in service to, uh, like a technical blog post, I mean, why not? Right.[00:56:04] Xe: And a lot of the times I'll usually have the like, uh, kind of bad take, be in a little conversation blurb thing so that people will additionally argue about the characterization of, you know, the imaginary cartoon shark or whatever.[00:56:20] Jeremy: That's good. It's the, uh, it's the Xe Xe universe that they're, they're stepping into.[00:56:27] Xe: I've heard people describe it, uh, lovingly as the xeiaso.net cinematic universe.I've had some ideas on how to expand it in the future with more characters that have more different kind of diverse backgrounds. But, uh, it turns out that writing this stuff is hard. Like actually very hard because you have to get this right.You have to get the right balance of like snark satire, uh, like enlightenment. Andit's, it's surprisingly harder than you'd think. Um, but after a while, I've just sort of managed to like figure out as I'm writing where the side tangents come off and which ones I should keep and which ones I should, uh, prune and which ones can also help, Gain deeper understanding with a little like Socratic dialogue to start with a Mo like an incomplete assumption, like an incomplete picture.And then, you know, a question of, wait, what about this thing? Doesn't that conflict with that? And like, well, yes. technically it does, but realistically we don't have to worry about that as much. So we can think about it just in terms of this bigger model and, uh, that's okay. Like, uh, I mentioned the OSI model earlier, you know, like the seven layer OSI model it's, you know, genuinely overkill for basically everything, except it's a really great conceptual model for figuring out the difference between, you know, like an ethernet cable, an ethernet, like the ethernet card, the IP stack TCP and, you know, TLS or whatever.I have a couple talks that are gonna be up by the time this is published. Uh, one of them is my, uh, rustconf talk on my, or what was it called? I think it was called the surreal horrors of PAM or something where I discussed my experience, trying to bug a PAM module in rust, uh, for work. And, uh, it's the kind of story where, you know, it's bad when you have a break point on dlopen.[00:58:31] Jeremy: That sounds like a nightmare.[00:58:32] Xe: Oh yeah. Like part of the attempting to fix that process involved, going very deep. We're talking like an HTML frame set in the internet archive for sunOS documentation that was written around the time that PAM was used. Like it's things that are bad enough were like everything in the frame set, but the contents had eroded away through bit rot and you know, you're very lucky just to have what you do.[00:59:02] Jeremy: well, I'm, I'm glad it was. It was you and not me. we'll get to, to hear about it and, and not have to go through the, the suffering ourselves.[00:59:11] Xe: yeah. One of the things I've been telling people is that I'm not like a brilliant programmer. Like I know a bunch of people who are definitely way smarter than me, but what I am is determined and, uh, determination is a bit stronger of a force than you'd think.[00:59:27] Jeremy: Yeah. I mean, without it, nothing gets done. Right.[00:59:30] Xe: Yeah.[00:59:31] Jeremy: as we wrap up, is there anything we missed or anything else you wanna mention? [00:59:36] Xe: if you wanna look at my blog, it's on xeiaso.net. That's X, E I a S o.net. Um, that's where I post things. You can see, like the 280 something articles at time of recording. It's probably gonna get to 300 at some point, oh God, it's gonna get to 300 at some point. Um, and yeah, from, I try to post articles about weekly, uh, depending on facts and circumstances, I have a bunch of talks coming up, like one about the hilarious over engineering I did in my blog.And maybe some more. If I get back positive responses from calls for paper submissions,[01:00:21] Jeremy: Very cool. Well, Xe thank you so much for, for coming on software engineering radio.[01:00:27] Xe: Yeah. Thank you for having me. I hope you have a good day and, uh, try out tailscale, uh, note my bias, but I think it's great.
Jonathan Shariat is the coauthor of the book Tragic Design and co-host of the Design Review Podcast. He's currently a Sr. Interaction Designer & Accessibility Program Lead at Google.This episode originally aired on Software Engineering Radio.Topics covered: How poor design kills in medical environments Causing harm with features meant to bring joy Considerations during the product development cycle Industry specific checklists and testing requirements Creating guiding principles for a team Why medical software often has poor UX Designing for crisis situations Why dark patterns can be bad in the long term Related Links @designuxui Tragic Design How Bad UX Killed Jenny Design Review podcast Deceptive Design TranscriptYou can help edit this transcript on GitHub.[00:00:00] Jeremy: Today I'm talking to Jonathan Shariat, he's the co-author of Tragic design. The host of the design review podcast. And he's currently a senior interaction designer and accessibility program lead at Google. Jonathan, welcome to software engineering radio.[00:00:15] Jonathan: Hi, Jeremy, thank you So much for having me on.[00:00:18] Jeremy: the title of your book is tragic design. And I think that people can take a lot of different meanings from that. So I wonder if you could start by explaining what tragic design means to you.[00:00:33] Jonathan: Hmm. For me, it really started with this story that we have in the beginning of the book. It's also online. Uh, I originally wrote it as a medium article and th that's really what opened my eyes to, Hey, you know, design has, is, is this kind of invisible world all around us that we actually depend on very critically in some cases.And So this story was about a girl, you know, a nameless girl, but we named her Jenny for the story. And in short, she came for treatment of cancer at the hospital, uh, was given the medication and the nurses that were taking care of her were so distracted with the software they were using to chart, make orders, things like that, that they miss the fact that she needed hydration and that she wasn't getting it.And then because of that, she passed away. And I still remember that feeling of just kind of outrage. And, you know, when we hear a lot of news stories, A lot of them are outraging. they, they touch us, but some of them, some of those feelings stay and they stick with you.And for me, that stuck with me, I just couldn't let it go because I think a lot of your listeners will relate to this. Like we get into technology because we really care about the potential of technology. What could it do? What are all the awesome things that could do, but we come at a problem and we think of all the ways it could be solved with technology and here it was doing the exact opposite.It was causing problems. It was causing harm and the design of that, or, you know, the way that was built or whatever it was failing Jenny, it was failing the nurses too, right? Like a lot of times we blame that end user and, and it caused it. So to me, that story was so tragic. Something that deeply saddened me and was regrettable and cut short someone's uh, you know, life and that's the definition of tragic, and there's a lot of other examples with varying degrees of tragic, but, um, you know, as we look at the impact technology has, and then the impact we have in creating those technologies that have such large impacts, we have a responsibility to, to really look into that and make sure we're doing as best of job as we can and avoid those as much as possible.Because the biggest thing I learned in researching all these stories was, Hey, these aren't bad people. These aren't, you know, people who are clueless and making these, you know, terrible mistakes. They're me, they're you, they're they're people. Um, just like you and I, that could make the same mistakes.[00:03:14] Jeremy: I think it's pretty clear to our audience where there was a loss of life, someone, someone died and that's, that's clearly tragic. Right? So I think a lot of things in the healthcare field, if there's a real negative outcome, whether it's death or severe harm, we can clearly see that as tragic.and I, I know in your book you talk about a lot of other types of, I guess negative things that software can cause. So I wonder if you could, explain a little bit about now past the death and the severe injury. What's tragic to you.[00:03:58] Jonathan: Yeah. still in that line of like of injury and death, And, you know, the side that most of us will actually, um, impact, our work day-to-day is also physical harm. Like, creating this software in a car. I think that's a fairly common one, but also, ergonomics, right?Like when we bring it back to something like less impactful, but still like multiplied over the impact of, multiplied over the impact of a product rather, it can be quite, quite big, right? Like if we're designing software in a way that's very repetitive or, you know, everyone's, everyone's got that, that like scroll, thumb, scroll, you know, issue.Right. if, uh, our phones aren't designed well, so there's a lot of ways that it can still physically impact you ergonomically. And that can cause you a lot of problem arthritis and pain, but yeah, there's, there's other, there's other, other ways that are still really impactful. So the other one is by saddening or angry.You know, that emotional harm is very real. And oftentimes sometimes it gets overlooked a little bit because it's, um, you know, physical harm is what is so real to us, but sometimes emotional harm isn't. But, you know, we talk about in the book, the example of Facebook, putting together this great feature, which takes your most liked photo, and, you know, celebrates your whole year by you saying, Hey, look at as a hero, you're in review this, the top photo from the year, they add some great, you know, well done illustrations behind it, of, of balloons and confetti and, people dancing.But some people had a bad year. Some people's most liked engaged photo is because something bad happened and they totally missed. And because of that, people had a really bad time with this where, you know, they lost their child that year. They lost their loved one that year, their house burnt down. Um, something really bad happened to them.And here was Facebook putting that photo of their, of their dead child up with, you know, balloons and confetti and people dancing around it. And that was really hard for people. They didn't want to be reminded of that. And especially in that way, and these emotional harms also come into the, in the play of, on anger.You know, we talk about, well, one, you know, there's, there's a lot of software out there that, that, um, tries to bring up news stories that anger us and which equals engagement. Um, but also ones that, um, use dark patterns to trick us into purchasing and buying and forgetting about that free trial. So they charge us for a yearly subscription and won't refund us.Uh, if you've ever tried to cancel a subscription, you start to see some real their their real colors. Um, so emotional harm and, uh, anger is a, is a big one. We also talk about injustice in the book where there are products that are supposed to be providing justice. Um, and you know, in very real ways like voting or, you know, getting people the help that they need from the government, or, uh, for people to see their loved ones in jail.Um, or, you know, you're getting a ticket unfairly because you couldn't read the sign was you're trying to read the sign and you, and you couldn't understand it. so yeah, we look at a lot of different ways that design and our saw the software that we create can have very real impact on people's lives and in a negative way, if we're not careful. [00:07:25] Jeremy: the impression I get, when you talk about tragic design, it's really about anything that could harm a person, whether physically, emotionally, you know, make them angry, make them sad. And I think the, the most liked photo example is a great one, because like you said, I think the people may be building something that, that harms and they may have no idea that they're doing it.[00:07:53] Jonathan: Exactly like that. I love that story because not, not to just jump on the bandwagon of saying bad things about like Facebook or something. No, I love that story because I can see myself designing the exact same thing, like being a part of that product, you know, building it, you know, looking at the, uh, the, the specifications, the, um, the, the PM, you know, put it that put together and the decks that we had, you know, like I could totally see that happening.And just never, I think, never having the thought, because our we're so focused on like delighting our users and, you know, we have these metrics and these things in mind. So that's why, like, in the book, we really talk about a few different processes that need to be part of. Product development cycle to stop, pause, and think about like, well, what are the, what are the negative aspects here?Like what are the things that could go wrong? What are the, what are the other life experiences that are negative? Um, that could be a part of this and you don't need to be a genius to think of every single thing out there. You know, like in this example, I think just talking about, you know, like, oh, well, some people might've had, you know, if they would have taken probably like, you know, one hour out of their entire project, or maybe even 10 minutes, they might've come up with like, oh, there could be bad thing.Right. But, um, so if you don't have that, that, that moment to pause that moment to just say, okay, we have time to brainstorm together about like how this could go wrong or how, you know, the negative of life could be impacted by this, um, feature that that's all that it takes. It doesn't necessarily mean that you need to do.You know, giant study around the impact, potential impact of this product and all the, all the ways, but really just having a part of your process that takes a moment to think about that will just create a better product and better, product outcomes. You know, if you think about all of life's experiences and Facebook can say, Hey, condolences, and like, you know, and show that thoughtfulness that would be, uh, I would have that have higher engagement that would have higher, uh, satisfaction, right?So they could have created a better outcome by considering these things and obviously avoid the impact negative impact to users and the negative impact to their product. [00:10:12] Jeremy: continuing on with that thought you're a senior interaction designer and you're an accessibility program lead. And so I wonder on the projects that you work on, and maybe you can give us a specific example, but how are you ensuring that you're, you're not running up against these problems where you build something that you think is going to be really great, um, for your users, but in reality ends up being harmful and specifically.[00:10:41] Jonathan: Yeah, one of the best ways is, I mean, it should be part of multiple parts of your cycle. If, if you want something, if you want a specific outcome out of your product development life cycle, um, it needs to be from the very beginning and then a few more times, so that it's not, you know, uh, I think, uh, programmers, uh, will all latch onto this, where they have the worst end of the stick, right?Because a and Q and QA as well. Because, you know, any bad decision or assumption that's happened early on with, you know, the, the business team or, or the PM, you know, gets like multiplied when they talk to the designer and then gets multiplied again, they hand it off. And it's always the engineer who has to, has to put the final foot down, be like, this doesn't make sense.Or I think users are going to react this way, or, you know, this is the implication of that, that assumption. So, um, it's the same thing, you know, in our team, we have it in the very early stage when someone's putting together the idea for the feature, our project, we want to work on it's right there. There's a few, there's like a section about accessibility and a few other sections, uh, talking about like looking out for this negative impact.So right away, we can have a discussion about it when we're talking about like what we should do about this and the D and the different, implications of implementing it. That's the perfect place for it. You know, like maybe, maybe when you're a brainstorm. Uh, about like, what should we should do? Maybe it's not okay there because you're trying to be creative.Right. You're trying to think. But at the very next step, when you're saying, okay, like what would it mean to build this that's exactly where I should start showing up and, you know, the discussion from the team. And it depends also the, the risk involved, right? Like, uh, it depends, which is attached to how much, uh, time and effort and resources you should put towards avoiding that risk it's risk management.So, you know, if you work, um, like my, um, you know, colleagues, uh, or, you know, some of my friends were working in the automotive industry and you're creating a software and you're worried that it might be distracting. There might be a lot more time and effort or the healthcare industry. Um, those were, those are, those might need to take a lot more resources, but if you're a, maybe a building, um, you know, SaaS software for engineers to spin up, you know, they're, um, you know resources.Um, there might be a different amount of resources. It never is zero, uh, because you still have, are dealing with people and you'll impact them. And, you know, maybe, you know, that service goes down and that was a healthcare service that went down because of your, you know, so you really have to think about what the risk is.And then you can map that back to how much time and effort you need to be spending on getting that. Right. And accessibility is one of those things too, where a lot of people think that it takes a lot of effort, a lot of resources to be accessible. And it really isn't. It just, um, it's just like tech debt, you know, if, if you have ignored your tech debt for, you know, five years, and then they're saying, Hey, let's all fix all the tech debt. Yeah. Nobody's going to be on board for that as much. Versus like, if, if addressing that and finding the right level of tech debt that you're okay with and when you address it and how, um, because, and just better practice. That's the same thing with accessibility is like, if you're just building it correctly, as you go, it's, it's very low effort and it just creates a better product, better decisions.Um, and it is totally worth the increased amount of people who can use it and the improved quality for all users. So, um, yeah, it's just kind of like a win-win situation.[00:14:26] Jeremy: one of the things you mentioned was that this should all start. At the very beginning or at least right after you've decided on what kind of product you're going to build, and that's going to make it much easier than if you come in later and try to, make fixes then, I wonder when you're all getting together and you're trying to come up with these scenarios, trying to figure out negative impacts, what kind of accessibility, needs you need to have, who are the people who are involved in that conversation?Like, um, you know, you have a team of 50 people who needs to be in the room from the very beginning to start working this out.[00:15:05] Jonathan: I think it would be the same people who are there for the project planning, like, um, at, on my team, we have our eng counter counterparts there. at least the team lead, if, if, if there's a lot of them, but you know, if they would go to the project kickoff, uh, they should be there.you know, we, we have everybody in their PM, design, engineers, um, our project manager, like anyone who wants to contribute, uh, should really be there because the more minds you have with this the better, and you'll, you'll tease out much, much more of, of of all the potential problems because you have a more, more, um, diverse set of brains and life experiences to draw from.And so you'll, you'll get closer to that 80% mark, uh, that you can just quickly take off a lot of those big items off the table, right? [00:16:00] Jeremy: Is there any kind of formal process you follow or is it more just, people are thinking of ideas, putting them out there and just having a conversation.[00:16:11] Jonathan: Yeah, again, it depends which industry you're in, what the risk is. So I previously worked at a healthcare industry, um, and for us to make sure that we get that right, and how it's going to impact the patients, especially though is cancer care. And they were using our product to get early warnings of adverse effects.Our, system of figuring that like, you know, if that was going to be an issue was more formalized. Um, in, in some cases, uh, like, like actually like healthcare and especially if the, if it's a device or, or in certain software circumstances, it's determined by the FDA to be a certain category, you literally have a, uh, governmental version of this.So the only reason that's there is because it can prevent a lot of harm, right? So, um, that one is enforced, but there's, there's reasons, uh, outside of the FDA to have that exact formalized part of your process. And it can, the size of it should scale depending on what the risk is. So on my team, the risk is, is actually somewhat low.it's really just part of the planning process. We do have moments where we, we, um, when we're, uh, brainstorming like what we should do and how the feature will actually work. Where we talk about like what those risks are and calling out the accessibility issues. And then we address those. And then as we are ready to, um, get ready to ship, we have another, um, formalized part of the process.There will be check if the accessibility has been taken care of and, you know, if everything makes sense as far as, you know, impact to users. So we have those places, but in healthcare, but it was much stronger where we had to, um, make sure that we re we we've tested it. We've, uh, it's robust. It's going to work on, we think it's going to work.Um, we, you know, we do user testing has to pass that user testing, things like that before we're able to ship it, uh, to the end user.[00:18:12] Jeremy: So in healthcare, you said that the FDA actually provides, is it like a checklist of things to follow where you must have done this? As you're testing and you must have verified these, these things that's actually given to you by the government.[00:18:26] Jonathan: That's right. Yeah. It's like a checklist and the testing requirement. Um, and there's also levels there. So, I have, I've only, I've only done the lowest level. I know. There's like, I think like two more levels above that. Um, and again, that's like, because the risk is higher and higher and there's more stricter requirements there where maybe somebody in the FDA needs to review it at some point.And, um, so again, like mapping it back to the risk that your company has is, is really important to understanding that is going to help you avoid and, and build a better product, avoid, you know, the bad impact and build a better product. And, and I think that's one of the things I would like to focus on as well.And I'd like to highlight for your, for your listeners, is that, it's not just about avoiding tragic design because one thing I've discovered since writing the book and sharing it with a lot of people. Is that the exact opposite thing is usually, you know, in a vast majority of the cases ends up being a strategically great thing to pursue for the product and the company.You know, if you think about, that, that example with, with Facebook, okay. You've run into a problem that you want to avoid, but if you actually do a 180 there and you find ways to engage with people, when they're grieving, you find people to, to develop features that help people who are grieving, you've created a value to your users, that you can help build the company off of.Right. Um, cause they were already building a bunch of joy features. Right. Um, you know, and also like user privacy, like I, we see apple doing that really well, where they say, okay, you know, we are going to do our ML on device. We are going to do, you know, let users decide on every permission and things like that.And that, um, is a strategy. We also see that with like something like T-Mobile, when they initially started out, they were like one of the nobody, uh, telecoms in the world. And they said, okay, what are all the unethical bad things that, uh, our competitors are doing? They're charging extra fees, you know, um, they have these weird data caps that are really confusing and don't make any sense their contracts, you get locked into for many years.They just did the exact opposite of that. And that became their business strategy and it, and it worked for them now. They're, they're like the top, uh, company. So, um, I think there's a lot of things like that, where you just look at the exact opposite and, you, one you get to avoid the bad, tragic design, but you also see boom, you see an opportunity that, um, become, become a business strategy.[00:21:03] Jeremy: So, so when you referred to exact opposite, I guess you're, you're looking for the potentially negative outcomes that could happen. there was the Facebook example of, of seeing a photo or being reminded of a really sad event and figuring out can I build a product around, still having that same picture, but recontextualizing it like showing you that picture in a way that's not going to make you sad or upset, but is actually a positive.[00:21:35] Jonathan: Yeah. I mean, I don't know maybe what the solution was, but like one example that comes to mind is some companies. Now, before mother's day, we'll send you an email and say, Hey, this is coming up. Do you want us to send you emails about mother's day? Because for some people that's Can, be very painful. That's that's very thoughtful.Right. And that's a great way to show that you, that you care. Um, but yeah, like, you know, uh, thinking about that Facebook example, like if there's a formalized way to engage with, with grieving, like, I would use Facebook for that. I don't use Facebook very often or almost at all, but you know, if somebody passed away, I would engage right with my, my Facebook account.And I would say, okay, look, there's like, there's this whole formalized, you know, feature around, you know, uh, and, and Facebook understands grieving and Facebook understands like this w this event and may like smooth that process, you know, creates comfort for the community that's value and engagement. that is worthwhile versus artificial engagement.That's for the sake of engagement. and that would create, uh, a better feeling towards Facebook. Uh, I would maybe like then spend more time on Facebook. So it's in their mutual interest to do it the right way. Um, and so it's great to focus on these things to avoid harm, but also to start to see new opportunities for innovation.And we see this a lot already in accessibility where there's so many innovations that have come from just fixing accessibility issues like closed captions. We all use it, on our TVs, in busy crowded spaces, on, you know, videos that have no, um, uh, translation for us in different places.So, SEO is, is the same thing. Like you get a lot of SEO benefit from, you know, describing your images and, and making everything semantic and things like that. And that also helps screen readers. and different innovations have come because somebody wanted to solve an accessibility need.And then the one I love, I think it's the most common one is readability, like contrast and tech size. Sure. There's some people who won't be able to read it at all, but it hurts my eyes to read bad contrast and bad text size. And so it just benefits. Everyone creates a better design. And one of the things that comes up so often when I'm, you know, I'm the accessibility program lead.And so I see a lot of our bugs is so many issues that, that are caught because of our, our audits and our, like our test cases around accessibility that just our bad design and our bad experience for everyone. And so we're able to fix that. And, uh, and it's just like an another driver of innovation and there's, there's, there's a ton of accessibility examples, and I think there's also a ton of these other, you know, ethical examples or, you know, uh, avoiding harm where you just can see it. It's an opportunity area where it's like, oh, let's avoid that. But then if you turn around, you can see that there's a big opportunity to create a business strategy out of it.[00:24:37] Jeremy: Can, can you think of any specific examples where you've seen that? Where somebody, you know, doesn't treat it as something to avoid, but, but actually sees that as an opportunity.[00:24:47] Jonathan: Yeah. I mean, I, I think that the, um, the apple example is a really good one where from the beginning, like they, they saw like, okay, in the market, there's a lot of abuse of information and people don't like that. So they created a business strategy around that And that's become a big differentiator for them.Right. Like they, they have like ML on the device. They do. Um, they have a lot of these permission settings, you know, the Facebook. It was very much focused right. On, on using customer data and a lot of it without really asking their permission. And so once apple said, okay, now all apps need to show what you're tracking.And, and then, um, and asked for permission to do that. A lot of people said no, and that caused about $10 billion of loss for, for Facebook. and for, for apple, it's, you know, they advertise on that now that we're, you know, ethical that, you know, we, we source things ethically and we, we care about user privacy and that's a strong position, right?Uh, I think there's a lot of other examples out there. Like I mentioned accessibility and others, but like it they're kind of overflowing, so it's hard to pick one.[00:25:58] Jeremy: Yeah. And I think what's interesting about that too, is with the example of focusing on user privacy or trying to be more sensitive around, death or things like that, as I think that other people in the industry will, will notice that, and then in their own products, then they may start to incorporate those things as well.[00:26:18] Jonathan: Yeah. Yeah, exactly what the example of with T-Mobile. once that worked really, really well and they just ate up the entire market, all the other companies followed suit, right? Like now, um, having those data caps that, you know, are, are very rare, having those surprise fees are a lot, uh, rare.Um, you know, there's, there's no more like deep contracts that lock you in and et cetera, et cetera. A lot of those have become industry standard now. Um, and so It, and it does improve the environment for everyone because, because now it becomes a competitive advantage that everybody needs to meet. Um, so yeah, I think that's really, really important.So when you're going through your product's life cycle, you might not have the ability to make these big strategic decisions. Like, you know, we want to, you know, not have data caps or whatever, but, you know, if you, if you're on that Facebook level and you run into that issue, you could say, well, look, what could we do to address this?What could we could do to, to help this and make, make that a robust feature? You know, when we talk about, lot of these dating apps, one of the problems was a lot of abuse, where women were being harassed or, you know, after the day didn't go well and you know, things were happening. And so a lot of apps have now dif uh, these dating apps have differentiated themselves and attracted a lot of that market because they deal with that really well.And they have, you know, it's built into the strategy. It's oftentimes like a really good place to start too, because one it's not something we generally think about very, very well, which means your competitors. Haven't thought about it very well, which means it's a great place to, to build products, ideas off of. [00:27:57] Jeremy: Yeah, that's a good point because I think so many applications now are like social media applications, their messaging applications there, their video chat, that sort of thing. I think when those applications were first built, they didn't really think so much about what if someone is, you know, sending hateful messages or sending, pictures that people really don't want to see.Um, people are doing abusive things. It was like, they just assume that, oh, people will be, people will be good to each other and it'll be fine. But, uh, you know, in the last 10 years, pretty much all of the major social media companies have tried to figure out like, okay, um, what do I do if someone is being abusive and, and what's the process for that?And basically they all have to do something now. Um,Um [00:28:47] Jonathan: Yeah. And that's a hard thing to like, if, if that, uh, unethical or that, um, bad design decision is deep within your business strategy and your company's strategy. It's hard to undo that like some companies are still, still have to do that very suddenly and deal with it. Right. Like, uh, I know Uber had a big, big part of them, like, uh, and some other companies, but, uh, we're like almost suddenly, like everything will come to a head and they'll need to deal with it.Or, you know, like, Twitter now try to try to get, be acquired by Elon Musk. Uh, some of those things are coming to light, but, I, what I find really interesting is that these these areas are like really ripe for innovation. So if you're interested in, a startup idea or you're, or you're working in a startup, or, you know, you're about to start one, you know, there's a lot of maybe a lot of people out there who are thinking about side projects right now, this is a great way to differentiate and win that market against other well-established competitors is to say, okay, well, what are they, what are they doing right now that is unethical. And it's like, you know, core to their business strategy and doing that differently is really what will help you, to win that market. And we see that happening all the time, you know, especially the ones that are like these established, uh, leaders in the market. they can't pivot like you can, so being able to say, I'm, we're going to do this ethically.We're going to do this, uh, with, you know, with these tragic design in mind and doing the opposite, that's going to help you to, to find your, your attraction in the market.[00:30:25] Jeremy: Earlier, we were talking about. How in the medical field, there is specific regulation or at least requirements to, to try and avoid this kind of tragic design. Uh, I noticed you also worked for Intuit before. Uh, um, so for financial services, I was wondering if there was anything similar where the government is stepping in and saying like, you need to make sure that, these things happen to avoid, these harmful things that can come up.[00:30:54] Jonathan: Yeah, I don't know. I mean, I didn't work on TurboTax, so I worked on QuickBooks, which is like a accounting software for small businesses. And I was surprised, like we didn't have a lot, like a lot of those robust things, we just relied on user feedback to tell us like, things were not going well. And, you know, and I think we should have, like, I think, I think that that was a missed opportunity, um, to.Show your users that you understand them and you care, and to find those opportunity areas. So we didn't have enough of that. And there was things that we shipped that didn't work correctly right out of the box, which, you know, it happens, but had a negative impact to users. So it's like, okay, well, what do we do about that?How do we fix that? Um, and if the more you formalize that and make it part of your process, the more you get out of it. And actually this is like, this is a good, a good, um, uh, pausing point bit that I think will affect a lot of engineers listening to this. So if you remember in the book, we talk about the Ford Pinto story and there isn't, I want to talk about this story and why I added it to the book.Is that, uh, one, I think this is the thing that engineers deal with the most, um, and, and designers do too, which is that okay. we see the problem, but we don't think it's worth fixing. Okay. Um, so that, that's what I'm going. That's what we're going to dig into here. So it's a, hold on for a second while I explain some, some history about this car.So the Ford Pinto, if you're not familiar is notorious, uh, because it was designed, um, and built and shipped and there, they knowingly had this problem where if it was rear-ended at even like a pretty low speed, it would burst into flames because the gas tank would rupture the, and then oftentimes the, the, the doors would get jammed.And so it became a death trap of fire and caused many deaths, a lot of injuries. And, um, in an interview with the CEO at the time, like almost destroyed Ford like very seriously would have brought the whole company down and during the design of it, uh, and design meaning in the engineering sense. Uh, and the engineering design of it, they say they found this problem and the engineers came up with their best solution.Was this a rubber block. Um, and the cost was, uh, I forget how many dollars let's say it was like $9. let's say $6, but this is again, uh, back then. And also the margin on these cars was very, very, very thin and very important to have the lowest price in the market to win those markets. The customers were very price sensitive, so they, uh, they being like the legal team looked at like some recent, cases where they have the value of life and started to come up with like a here's how many people would sue us and here's how much it would cost to, uh, to, to settle all those.And then here's how much it would cost to add this to all the cars. And it was cheaper for them to just go with the lawsuits and they, they found. Um, and I think why, I think why this is so important is because of the two things that happened afterward, one, they were wrong. it was a lot more people it affected and the lawsuits were for a lot more money.And two after all this was going crazy and it was about to destroy the company, they went back to the drawing board and what did the engineers find? They found a cheaper solution. They were able to rework that, that rubber block and and get it under the margin and be able to hit the mark that they wanted to.And I think that's, there's a lot of focus on the first part because it's so unethical to the value of life and, and, um, and doing that calculation and being like we're willing to have people die, but in some industries, it's really hard to get away with that, but it's also very easy. To get into that.It's very easy to get lulled into this sense of like, oh, we're just going to crunch the numbers and see how many users it affects. And we're okay with that. Um, versus when you have principals and you have kind of a hard line and you, and you care a lot more than you should. And, and you really push yourself to create a more ethical, more, a safer, you know, avoiding, tragic design, then you, there there's a solution out there.Like you actually get to innovation, you actually get to the solving the problem versus when you just rely on, oh, you know, the cost benefit analysis we did is that it's going to take an engineer in a month to fix this and blah blah blah. But if, if you have those values, if you have those principles and you're like, you know what, we're not okay shipping this, then you'll, you'll find that.They're like, okay, there's, there's a cheaper way to, to fix this. There's another way we could address this. And that happens so often. and I know a lot of engineers deal with that. A lot of saying like, oh, you know, this is not worth our time to fix. This is not worth our time to fix. And that's why you need those principles is because oftentimes you don't see it and it's, but it's right there at right outside of the edge of your vision. [00:36:12] Jeremy: Yeah. I mean, with the Pinto example, I'm just picturing, you know, obviously there wasn't JIRA back then, but you can imagine that somebody's having an issue that, Hey, when somebody hits the back of the car, it's going to catch on fire. Um, and, and going like, well, how do I prioritize that? Right? Like, is this a medium ticket?Is this a high ticket? And it's just like, it's just, it just seems insane, right? That you could, make the decision like, oh no, this isn't that big an issue. You know, we can move it down to low priority and, and, and, ship it.Okay. [00:36:45] Jonathan: Yeah. And, and, and that's really what principals do for you, right? Is they help you make the tough decisions. You don't need a principle for an easy one. Uh, and that's why I really encourage people in the book to come together as a team and come up with what are your guiding principles. Um, and that way it's not a discussion point every single time.It's like, Hey, we've agreed that this is something that we, that we're going to care about. This is something that we are going to stop and, fix. Like, one of the things I really like about my team at Google is product excellence is very important to us. and. there are certain things that, uh, we're, you know, we're Okay. with, um, letting slip and fixing at a next iteration.And, you know, obviously we make sure we actually do that. Um, so it's not like we, we, we always address everything, but because it's one of our principles. We care more. We have more, we take on more of those tickets and we take on more of those things and make sure that they ship before, um, can make sure that they're fixed before we ship.And, and it shows like to the end user that th that this company cares and they have quality. Um, so it's one of it. You need a principal to kind of guide you through those difficult things that aren't obvious on a decision to decision basis, but, you know, strategically get you in somewhere important, you know, and, and like, like design debt or, um, our technical debt where it's like, this should be optimized, you know, this chunk of code, like, nah, but you know, in, in it grouping together with a hundred of those decisions.Yeah. It's gonna, it's gonna slow it down every single project from here on out. So that's why you need those principles.[00:38:24] Jeremy: So in the book, uh, there are a few examples of software in healthcare. And when you think about principles, you would think. Generally everybody on the team would be on board that we want to give whatever patient that's involved. We want to give them good care. We want them to be healthy. We don't want them to be harmed.And given that I I'm wondering because you, you interviewed multiple people in the book, you have a few different case studies. Um, why do you think that medical software in particular seems to be, so it seems to have such poor UX or has so many issues.[00:39:08] Jonathan: Yeah, that's a, complicated topic. I would summarize it with a few, maybe three different reasons. Um, one which I think is, uh, maybe a driving factor of, of some of the other ones. Is that the way that the medical, uh, industry works is the person who purchases the software. It's not the end user. So it's not like you have doctors and nurses voting on, on which software to use.Um, and so oftentimes it's, it's more of like a sales deal and then just gets pushed out and they, and they also have to commit to these things like, um, the software is very expensive and, uh, initially with, you know, like in the early days was very much like it needs to be installed, maintain, there has to be training.So there was a lot to money to be made, in those, in that software. And, and so the investment from the hospital was a lot, so they can't just be like, oh, can it be to actually, don't like this one, we're going to switch to the next one. So, because like, once it's sold, it's really easy to just like, keep that customer.There's very little incentive to like really improve it unless you're selling them a new feature. So there's a lot of feature add ons. Because they can charge more for those, but improving the experience and all that kind of stuff. There is less of that. I think also there's just generally a lot less like, uh, understanding of design, in that field.And there's a lot more because there's sort of like traditions of things. they end up putting a lot of the pressure and the, that responsibility on the end individuals. So, you know, you've heard recently of that nurse who made a medication error and she's going to jail for that. And sh you know, And oftentimes we blame that end, that end person.So the, the nurse gets all the blame or the doctor gets all the blame. Well, what about the software, you know, who like made that confusing or, you know, what about the medication that looks exactly like this other medication? Or what about the pump tool that you have to, you know, type everything in very specifically, and the nurses are very busy.They're doing a lot of work. There's a 12 hour shifts. They're dealing with lots of different patients, a lot of changing things for them to have to worry about having to type something a specific way. And yet when those problems happen, what do they do? They don't go in like redesign the devices. Are they more training, more training, more training, more training, and people only can absorb so much training.and so I think that's part of the problem is that like, there's no desire to change. They blame the end, the wrong person, and. Uh, lastly, I think that, um, it is starting to change. I think we're starting to see like the ability for, because of the fact that the government is pushing healthcare records to be more interoperable, meaning like I can take my health records anywhere, that a lot of the power comes in where the data is.And so, um, I'm hoping that, uh, you know, as the government and people and, um, and initiatives push these big companies, like epic to be more open, that things will improve. One is because they'll have to keep up with their competitors and that more competitors will be out there to improve things. Because I, I think that there's, there's the know-how out there, but like, because the there's no incentive to change and, and, and there's no like turnover and systems and there's the blaming of the end user.We're not going to see a change anytime soon.[00:42:35] Jeremy: that's a, that's a good point in terms of like, it, it seems like even though you have all these people who may have good ideas may want to do a startup, uh, if you've got all these hospitals that already locked into this very expensive system, then yeah. Where's, where's the room to kind of get in there in and have that change.[00:42:54] Jonathan: yeah. [00:42:56] Jeremy: Uh, another thing that you talk about in the book is about how, when you're in a crisis situation, the way that a user interacts with something is, is very different. And I wonder if you have any specific examples for software when, when that can happen.[00:43:15] Jonathan: yeah. Designing for crisis is a very important part of every software because, it might be hard for you to imagine being in that situation, but, it, it definitely will still happen so. one example that comes to mind is, uh, you know, let's say you're working on a cloud, um, software, like, uh, AWS or Google cloud.Right. there's definitely use cases and user journeys in your product where somebody would be very panicked. Right. Um, and if you've ever been on an on-call with, with something and it goes south, and it's a big deal, you don't think. Right. Right. Like when we're in crisis, our brains go into a totally different mode of like that fight or flight mode.And we don't think the way we do, it's really hard to read and comprehend very hard. and we might not make this, the right decisions and things like that. So, you know, thinking about that, like maybe your, your let's say, like, going back to that, the cloud software, like let's say you're, you're, you're working on that, like.Are you relying on the user reading a bunch of texts about this button, or is it very clear from the way you've crafted that exact button copy and how big it is? And, and it's where it is relation to a bunch of other content? Like what exactly it does. It's going to shut down the instance where it's gonna, you know, it's, it's gonna, do it at a delay or whatever, like be able to all those little decisions, like are really impactful.And when you, when you run them through the, um, the, the furnace of, of, of, uh, um, a user journey that's relying on, on a really urgent situation, you'll obviously help that. And you'll, you'll start to see problems in your UI that you hadn't noticed before, or, or different problems in the way you're implementing things that you didn't notice before, because you're seeing it from a different way.And that's one of the great things about, um, the, the systems and the book that we talk about around, like, thinking about how things could go wrong, or, you know, thinking about, you know, designing for crisis. Is it makes you think of some new use cases, which makes you think of some new ways to improve your product.You know, that improvement you make to make it so obvious that someone could do it in a crisis would help everyone, even when they're not in a crisis. Um, so that, that's why it's important to, to focus on those things. [00:45:30] Jeremy: And for someone who is working on these products, it's kind of hard to trigger that feeling of crisis. If there isn't actually a crisis happening. So I wonder if you can talk a little bit about how you, you try to design for that when it's not really happening to you. You're just trying to imagine what it would feel like.[00:45:53] Jonathan: yeah. Um, you're never really going to be able to do that. Like, so some of it has to be simulated, One of the ways that we are able to sort of simulate what we call cognitive load. Which is one of the things that happen during a crisis. But what also happened when someone's very distracted, they might be using your product while they're multitasking.We have a bunch of kids, a toddler constantly pulling on their arm and they're trying to get something done in your app. So, one of the ways that has been shown to help, uh, test that is, um, like the foot tapping method. So when you're doing user research, you have the user doing something else, like tapping or like, You know, uh, make it sound like they have a second task that they're doing on the side.It's manageable, like tapping their feet and their, their hands or something. And then they also have to do your task. Um, so like you can like build up what those tabs with those extra things are that they have to do while they're also working on, uh, finishing the task you've given them. and, and that's one way to sort of simulate cognitive load.some of the other things is, is really just, um, you know, listening to users, stories and, and find out, okay, this user was in crisis. Okay, great. Let's talk to them and interview them about that. Uh, if it was fairly recently within like the past six months or something like that. but, but sometimes you don't like, you just have to run through it and do your best.Um, and you know, those black Swan events or those, even if you're able to simulate it yourself, like put your, put your, put yourself into that exact position and be in panic, which, you know, you're not able to, but if you were that still would only be your experience and you wouldn't know all the different ways that people could experience this.So, and there's going to be some point in time where you're gonna need to extrapolate a little bit and, you know, extrapolate from what you know, to be true, but also from user testing and things like that. And, um, and then wait for a real data [00:47:48] Jeremy: You have a chapter in the book on design that angers and there were, there were a lot of examples in there, on, on things that are just annoying or, you know, make you upset while you're using software. I wonder for like our audience, if you could share just like a few of your, your favorites or your ones that really stand out.[00:48:08] Jonathan: My favorite one is Clippy because, um, you know, I remember growing up, uh, you know, writing software, writing, writing documents, and Clippy popping up. And, I was reading an article about it and obviously just like everybody else, I hated it. You know, as a little character, it was fun, but like when you're actually trying to get some work done, it was very annoying.And then I remember, uh, a while later reading this article about how much work the teams put into clubby. Like, I mean, if you think about it now, It had a lot of like, um, so the AI that we're playing with just now, um, around like natural language processing, understanding, like what, what type of thing you're writing and coming up with contextualized responses, like it was pretty advanced for the, uh, very advanced for the time, you know, uh, adding animation triggers to that and all, all that.Um, and they had done a lot of user research. I was like, what you did research in, like you had that reaction. And I love that example because, oh, and also by the way, I love how they, uh, took Clippy out and S and highlighted that as like one of the features of the next version of the office, uh, software.but I love that example again, because I see myself in that and, you know, you ha you have a team doing something technologically amazing doing user research, uh, and putting out a very great product, but he totally missing. And a lot of products do that. A lot of teams do that. And why is that? It's because they're, um, they're not thinking about, uh, they're putting their, they're putting the business needs or the team's needs first and they're putting the user's needs second.And whenever we do that, whenever we put ourselves first, we become a jerk, right? Like if you're in a relationship and you're always putting yourself first, that relationship is not going to last long or it's not going to go very well. And yet we Do that with our relationship with users where we're constantly just like, Hey, well, what is the business?The business wants users to not cancel here so let's make it very difficult for people to cancel. And that's a great way to lose customers. That's a great way to create, this dissonance with your, with your users. And, um, and so if you, if you're, focused on like, this is what the we need to accomplish with the users, and then you work backwards from.You're you're, you're, you're, you're lower your chances of missing it, of getting it wrong of angering your users. and const always think about like, you sometimes have to be very real with yourselves and your team. And I think that's really hard for a lot of teams because we have we don't want to look bad.We don't want to, but what I found is those are the people who actually, um, get promoted. Like, you know, if you look at the managers and directors and stuff, those are the people who can be brutally honest. Right. Um, who can say, like, I don't think this is ready. I don't, I don't think this is good. And so you actually, I, I, you know, I've done that in the front of like our CEO and things like that.And I've always had really good responses from them to say, like, we really appreciate that you, you know, uh, you can call that out and you can just call it like, it is like, Hey, this is what we see this user. Maybe we shouldn't do this at all. Maybe. Um, and that can, uh, you know, at Google that's one of the criteria that we have in our software engineers and the designers of being able to spot things that are, you know, things that we shouldn't should stop doing.Um, and so I think that's really important for the development of, of a senior engineer, uh, to be able to, to know that that's something like, Hey, this project, I would want it to work, but in its current form is not good. And being able to call that out is very important.[00:51:55] Jeremy: Do you have any specific examples where there was something that was like very obvious to you? To the rest of the team or to a lot of other people that wasn't.[00:52:06] Jonathan: um, yeah, so here's an example I finally got, I was early on in my career and I finally got to lead in our whole project. So we are redesigning our business micro-site um, and I got to, I got, uh, assigned two engineers and another designer and I got to lead the whole. I was, I was like, this is my chance.Right? So, and we had a very short timeline as well, and I put together all these designs. And, um, one of the things that we aligned on at the time was like as really cool, uh, so I put together this really cool design for the contact form, where you have like, essentially, I kind of like ad-lib, it looks like a letter.and you know, by the way, give me a little bit of, of, uh, of, of leeway here. Cause this was like 10 years ago, but, uh, it was like a letter and you would say like, you're addressing it to our company. And so it had all the things we wanted to get out of you around like your company size, your team, like, and so our sales team would then reach out to this customer.I designed it and I had shown it to the team and everybody loved it. Like my manager signed off on it. Like all the engineers signed off on it, even though we had a short timeline, they're like, yeah, well we don't care. That's so cool. We're going to build it. But as I put it through that test of like, does this make sense for the, what the user wants answers just kept saying no to me.So I had to go and back in and pitch everybody and argue with them around not doing the cool idea that I wanted to do. And, um, eventually they came around and that form performed once we launched it performed really well. And I think about like, what if users had to go through this really wonky thing?Like this is the whole point of the website is to get this contact form. It should be as easy and as straightforward as possible. So I'm really glad we did that. And I can think of many, many more of those situations where, you know, um, we had to be brutally honest with ourselves with like this isn't where it needs to be, or this isn't what we should be doing.And we can avoid a lot of harm that way too, where it's like, you know, I don't, I don't think this is what we should be building. Right.[00:54:17] Jeremy: So in the case of this form, was it more like you, you had a bunch of drop-downs or S you know, selections where you would say like, okay, these are the types of information that I want to get from the person filling out the form as a company. but you weren't looking so much at, as the person filling out the form, this is going to be really annoying.Was that kind [00:54:38] Jonathan: exactly, exactly. Like, so their experience would have been like, they come up, they come at the end of this page or on like contact us and it's like a letter to our company. And like, we're essentially putting words in their mouth because they're, they're filling out the, letter. Um, and then, yeah, it's like, you know, you have to like read and then understand like what, what that part of this, the, the page was asking you and, you know, versus like a form where you're, you know, it's very easy.Well-known bam. You're, you're you're on this page. So you're interested in, so like, get it, get them in there. So we were able to, to decide against that and that, you know, we, we also had to, um, say no to a few other things, but like we said yes, to some things that were great, like responsive design, um, making sure that our website worked at every single use case, which is not like a hard requirement at the time, but was really important to us and ended up helping us a lot because we had a lot of, you know, business people who are on their phone, on the go, who wanted to, to check in and fill out the form and do a bunch of other stuff and learn about us.So that, that, that sales, uh, micro-site did really well because I think we made the right decisions and all those kinds of areas. And like those, those general, those principles helped us say no to the right things, even though it was a really cool thing, it probably would have looked really great in my portfolio for a while, but it just wasn't the right thing to do for the, the, the goal that we had.[00:56:00] Jeremy: So did it end up being more like just a text box? You know, a contact us fill in. Yeah.[00:56:06] Jonathan: You know, with usability, you know, if someone's familiar with something and it's, it's tired, everybody does it, but that means everybody knows how to use it. So usability constantly has that problem of innovation being less usable. Um, and so sometimes it's worth the trade-off because you want to attract people because of the innovation and they'll bill get over that hump with you because the innovation is interesting.So sometimes it's worth it and sometimes it's not, and you really have to, I'd say most times it's not. Um, and So you have to find like, what is, when is it time to innovate and when is it time to do the what's tried and true. Um, and on a business microsite, I think it's time to do tried and true. [00:56:51] Jeremy: So in your research for the book and all the jobs you've worked previously, are there certain. Mistakes or just UX things that you've noticed that you think that our audience should know about?[00:57:08] Jonathan: I think dark patterns are one of the most common, you know, tragic design mistakes that we see, because again, you're putting the company first and the user second. And you know, if you go to a trash, sorry, if you go to a dark patterns.org, you can see a great list. Um, there's a few other sites that have a nice list of them and actually Vox media did a nice video about, uh, dark patterns as well.So it's gaining a lot of traction, but you know, things like if you try to cancel your search, like Comcast service or your Amazon service, it's very hard. Like I think I wrote this in the book, but. Literally re researched what's the fastest way to delete it to, to, you know, uh, remove your Comcast account.I prepared everything. I did it through chat because that was the fastest way for first, not to mention finding chat by the way was very, very hard for me. Um, so I took me, even though I was like, okay, I have to find I'm going to do it through chat. I'm gonna do all this. It took me a while to find like chat, which I couldn't find it.So once I finally found it from that point to deleting from having them finally delete my account was about an hour. And I knew what to do going in just to say all the things to just have them not bother me. So th that's on purpose they've purposely. Cause it's easier to just say like fine, I'll take the discount thing.You're throwing in my face at the last second. And it's almost become a joke now that like, you know, you have to cancel your Comcast every year, so you can keep the costs down. Um, you know, and Amazon too, like trying to find that, you know, delete my account is like so buried. You know, they do that on purpose and a lot of companies will do things like, you know, make it very easy to sign up for a free trial and, and hide the fact that they're going to charge you for a year high.The fact that they're automatically going to bill you not remind you when it's about to expire so that they can like surprise, get you in to forget about this billing subscription or like, you know, if you've ever gotten Adobe software, um, they are really bad at that. They, they trick you into like getting this like monthly sufficient, but actually you've committed to a year.And if you want to cancel early, we'll charge you like 80% of the year. And, uh, and there's a really hard to contact anybody about it. So, um, it happens quite often. If the more you read into those, um, different things, uh, different patterns, you'll start to see them everywhere. And users are really catching onto a lot of those things and are responding.To those in a very negative way. And like, um, we recently, uh, looked at a case study where, you know, this free trial, um, this company had a free trial and they had like the standard free trial, um, uh, kind of design. And then their test was really just focusing on like, Hey, we're not going to scam you. If I had to summarize that the entire direction of the second one, it was like, you know, cancel any time.Here's exactly how much you'll be charged. And on the, it'll be on this date, uh, at five days before that we'll remind you to cancel and all this stuff, um, that ended up performing about 30% better than the other one. And the reason is that people are now burned by that trick so much so that every time they see a free trial, they're like, forget it.I don't, I don't want to deal with all this trickery. Like, oh, I didn't even care about to try the product versus like. We were not going to trick you. We really want you to actually try the product and, you know, we'll make sure that if you're not wanting to move forward with this, that you have plenty of time and plenty of chances to lead and that people respond to that now.So that's what we talked about earlier in the show of doing the exact opposite. This is another example of that. [01:00:51] Jeremy: Yeah, because I think a lot of people are familiar with, like you said, trying to cancel Comcast or trying to cancel their, their New York times subscription. And they, you know, everybody is just like, they get so mad at the process, but I think they also may be assume that it's a positive for the company, but what you're saying is that maybe, maybe that's actually not in the company's best interest.[01:01:15] Jonathan: Yeah. Oftentimes what we find with these like dark patterns or these unethical decisions is that th they are successful because, um, when you look at the most impactful, like immediate metric, you can look at, it looks like it worked right. Like, um, you know, let's say for that, those free trials, it's like, okay, we implemented like all this trickery and our subscriptions went up.But if you look at like the end, uh, result, um, which is like farther on in the process, it's always a lot harder to track that impact. But we all know, like when we look at each other, like when we, uh, we, we, we talk to each other about these different, um, examples. Like we know it to be true, that we all hate that.And we all hate those companies and we don't want to engage with them. And we don't, sometimes we don't use the products at all. So, um, yeah, it, it, it's, it's one of those things where it actually has like that, very real impact, but harder to track. Um, and so oftentimes that's how these, these patterns become very pervasive is the oh, and page views went up, uh, this was, this was a really, you know, this is high engagement, but it was page views because people were refreshing the page trying to figure out where the heck to go. Right. So um, oftentimes they they're less effective, but they're easier to track[01:02:32] Jeremy: So I think that's, that's a good place to, to wrap things up, but, um, if people want to check out the book or learn more about what you're working on your podcast, where should they head?[01:02:44] Jonathan: Um, yeah, just, uh, check out tragic design.com and our podcast. You can find on any of your podcasting software, just search design review podcast. [01:02:55] Jeremy: Jonathan, thank you so much for joining me on software engineering radio.[01:02:59] Jonathan: alright, thanks Jeremy. Thanks everyone. And, um, hope you had a good time. I did.
This episode originally aired on Software Engineering Radio.Randy Shoup is the VP of Engineering and Chief Architect at eBay. He was previously the VP of Engineering at WeWork and Stitch Fix, a Director of Engineering at Google Cloud where he worked on App Engine, and a Chief Engineer and Distinguished Architect at eBay in 2004. Topics covered: eBay's origins as a single C++ class The five-year migration to Java services Sharing a database between the old and new systems Building a distributed tracing system Working with bare metal Why most companies should stick to cloud Why individual services should own their own data storage How scale has caused solutions to change Rejoining a former company The Accelerate Book Improving delivery time. Related Links:@randyshoupOpenTelemetryLightStepHoneycombAccelerate BookThe MemoValue Stream MappingThe Epic Story of Dropbox's Exodus from the Amazon Cloud EmpireTranscript:[00:00:00] Jeremy: Today, I'm talking to Randy Shoup, he's the VP of engineering and chief architect at eBay.[00:00:05] Jeremy: He was previously the VP of engineering at WeWork and stitch fix, and he was also a chief engineer and distinguished architect at eBay back in 2004. Randy, welcome back to software engineering radio. This will be your fifth appearance on the show. I'm pretty sure that's a record.[00:00:22] Randy: Thanks, Jeremy, I'm really excited to come back. I always enjoy listening to, and then also contributing to software engineering radio.Back at, Qcon 2007, you spoke with Markus Volter he's he was the founder of SE radio. And you were talking about developing eBay's new search engine at the time.[00:00:42] Jeremy: And kind of looking back, I wonder if you could talk a little bit about how eBay was structured back then, maybe organizationally, and then we can talk a little bit about the, the tech stack and that sort of thing.[00:00:53] Randy: Oh, sure. Okay. Yeah. Um, so eBay started in 1995. I just want to like, you know, orient everybody. Same, same as the web. Same as Amazon, same as a bunch of stuff. So E-bay was actually almost 10 years old when I joined. That seemingly very old first time. Um, so yeah. What was ebay's tech stack like then? So E-bay current has gone through five generations of its infrastructure.It was transitioning between the second and the third when I joined in 2004. Um, so the. Iteration was Pierre Omidyar, the founder three-day weekend three-day labor day weekend in 1995, playing around with this new cool thing called the web. He wasn't intending to build a business. He just was playing around with auctions and wanted to put up a webpage.So he had a Perl backend and every item was a file and it lived on this little 486 tower or whatever you had at the time. Um, so that wasn't scalable and wasn't meant to be. The second generation of eBay's architecture was what we called V2 very, you know, creatively, uh, that was a C++ monolith. Um, an ISAPI DLL with essentially well at its worst, which grew to 3.4 million lines of code in that single DLL and basically in a single class, not just in a single, like repo or a single file, but in a single class.So that was very unpleasant to work in. As you can imagine, um, eBay had about a thousand engineers at the time and they were, you know, as you can imagine, like really stepping on each other's toes and not being able to make much forward progress. So starting in, I want to call it 2002. So two years before I joined, um, they were migrating to the creatively named V3 and V3 architecture was Java, and.you know, not microservices, but like we didn't even have that term, but it wasn't even that it was mini applications. So I'm actually going to take a step back. V2 was a monolith. So like all of eBay's code in that single DLL and like that was buying and selling and search and everything. And then we had two monster databases, a primary and a backup big Oracle machines on some hardware that was bigger, you know, bigger than refrigerators and that ran eBay for a bunch of years, before we changed the upper part of the stack, we, um, chopped up the, that single monolithic database into a bunch of, um, domain specific databases or entity specific databases, right?So a set of databases around users, you know, sharded by the user ID could talk about all that. If you want, you know, items again, sharded by item ID transactions, sharded by transaction ID... I think when I joined, it was the several hundred instances of, uh, Oracle databases, um, you know, spread around, but still that monolithic front end.And then in 2002, I wanna say we started migrating into that V3 that I was saying, okay. So that's, uh, that was a rewrite in Java, again, many applications. So you take the front end and instead of having it be in one big unit, it was this, uh, ER file, EAR, file, if run and people remember back to, you know, those stays in Java, um, you know, 220 different of those.So like here is the, you know, one of them for the search pages, you know, so the, you know, one application be the search application and it would, you know, do all the search related stuff, the handful of pages around search, uh, ditto for, you know, the buying area, ditto for the, you know, checkout area, ditto for the selling area...220 of those. Um, and that was again, domain, um, vertically sliced domains. And then the relationship between those V3, uh, applications and the databases was a many to many things. So like many applicants, many of those applications interact with items. So they would interact with those item databases. Many of them would interact with users.And so they would interact with a user databases, et cetera, uh, happy to go into as much gory detail as you want about all that. But like, that's what, uh, but we were in the transition period. You know, when I, uh, between the V2 monolith to the V3 mini applications in, uh, 2004, I'm just going to pause there and like, let me know where you want to take it.[00:05:01] Jeremy: Yeah. So you were saying that it was, um, it started as Perl, then it became a C++, and that's kind of interesting that you said it was all in one class, right? So it's wow. That's gotta be a gigantic [00:05:16] Randy: I mean, completely brutal. Yeah. 3.4 million lines of code. Yeah. We were hitting compiler limits on the number of methods per class.[00:05:22] Jeremy: Oh my gosh.[00:05:23] Randy: I'm, uh, uh, scared that I have that. I happen to know that at least at the time, uh, Microsoft allowed you 16 K uh, methods per class, and we were hitting that limit.So, uh, not great.[00:05:36] Jeremy: So it's just kind of interesting to think about how do you walk through that code, right? You have, I guess you just have this giant file.[00:05:45] Randy: Yeah. I mean, there were, you know, different methods. Um, but yeah, it was a big man. I mean, it was a monolith, it was, uh, you know, it was a spaghetti mess. Um, and you know, as you can imagine, Amazon went through a really similar thing by the way. So this wasn't soup. I mean, it was bad, but like we weren't the only people that were making that, making that a mistake.Um, and just like Amazon, where they were, uh, they did like one update a quarter (laughs) , you know, at that period, like 2000, uh, we were doing something really similar, like very, very slow. Um, you know, updates and, uh, when we moved to V3, you know, the idea was to get to do changes much faster. And we were very proud of ourselves starting in 2004 that we, uh, upgraded the whole site every two weeks.And we didn't have to do the whole site, but like each of those individual applications that I was mentioning, right. Those 220 applications, each of those would roll out on this biweekly cadence. Um, and they had interdependencies. And so we rolled them out in this dependency order in any way, lots of, lots of complexity associated with that.Um, yeah, there you go.[00:06:51] Jeremy: the V3 that, that was written in Java, I'm assuming this was a, as a complete rewrite. You, you didn't use the C++ code at all.[00:07:00] Randy: Yeah. And, uh, it was, um, we migrated, uh, page by page. So, uh, you know, in the transition period, which lasted probably five years, um, there were pages, you know, in the beginning, all pages were served by V2. In the end, all pages are served by V3 and, you know, over time you iterate and you like rewrite in parallel, you know, rewrite and maintain in parallel the V3 version of XYZ page and the V2 version of XYZ page.Um, and then when you're ready, you start to test out at low percentages of traffic, you know, what would, what does V3 look like? Is it correct? And when it isn't do you go and fix it, but then ultimately you migrate the traffic over, um, to fully take, get fully be in the V3 world, and then you, you know, remove or comment out or whatever.The, the code that supported that in the V2 monolith.[00:07:54] Jeremy: And then you had mentioned using Oracle databases. Did you have a set for V2 and a separate V3 and you were kind of trying to keep them in sync?[00:08:02] Randy: Oh, great question. Thank you for asking that question. No, uh, no. We had the databases. Um, so again, as I mentioned, we had pre-demonolith that's my that's a technical term, uh, pre broken up the databases starting in, let's call it 2000. Uh, actually I'm almost certain that's 2000. Cause we had a major site outage in 1999, which everybody still remembers who was there at the time.Uh wasn't me or I wasn't there at the time. Uh, but you know, you can look it up. Uh, anyway, so yeah, starting in 2000, we broke up that monolithic database into what I was telling you before those entity aligned databases. Again, one set for items, one set for users, one set for transactions, you know, dot dot, dot, um, and that division of those databases was shared.You know, those databases were shared between. The three using those things and then V sorry, V2 using those things and V3 using those things. Um, and then, you know, so we've completely decoupled the rewrite of the database, you know, kind of data storage layer from the rewrite of the application layer, if that makes sense.[00:09:09] Jeremy: Yeah. So, so you had V2 that was connecting to these individual Oracle databases. You said like they were for different types of entities, like maybe for items and users and things like that. but it was a shared database situation where V2 was connected to the same database as V3. Is that right?[00:09:28] Randy: Correct and also in V3, even when done. Different V3 applications, were also connecting to the same database, again, like anybody who used user, anybody who used the user entity, which is a lot we're connecting to the user suite of databases and anybody who used the item entity, which again is a lot, um, you were connecting to the item databases, et cetera.So yeah, it was this many to many that's, I'm trying to say many to many relationship between applications in the V3 world and databases.[00:10:00] Jeremy: Okay. Yeah, I think I, I got it because[00:10:03] Randy: It's easier with a diagram.[00:10:04] Jeremy: yeah. W 'cause when you, when you think about services now, um, you think of services having dependencies on other services. Whereas in this case you would have multiple services that rather than talking to a different service, they would all just talk to the same database.They all needed users. So they all needed to connect to the user's database.[00:10:24] Randy: Right exactly. And so, uh, I don't want to jump ahead in this conversation, but like the problems that everybody has, everybody who's feeling uncomfortable at the moment. You're right. To feel uncomfortable because that wasn't unpleasant situation and microservices, or more generally the idea that individual services would own their own data.And only in the only are interactions to the service would be through the service interface and not like behind the services back to the, to the data storage layer. Um, that's better. And Amazon discovered that, you know, uh, lots of people discovered that around that same, around that same early two thousands period.And so yeah, we had that situation at eBay at the time. Uh, it was better than it was before. Right, right. Better than a monolithic database and a monolithic application layer, but it definitely also had issues. Uh, as you can imagine,[00:11:14] Jeremy: you know, thinking about back to that time where you were saying it's better than a monolith, um, what were sort of the trade-offs of, you know, you have a monolith connecting to all these databases versus you having all these applications, connecting to all these databases, like what were the things that you gained and what did you lose if that made sense?[00:11:36] Randy: Hmm. Yeah. Well, I mean, why we did it in the first place is develop is like isolation between development teams right? So we were looking for developer productivity or the phrase we used to use was feature velocity, you know, so how quickly would we be able to move? And to the extent that we could move independently, you know, the search team could move independently from the buying team, which could move independently from the selling team, et cetera.Um, that was what we were gaining. Um, what were we losing? Uh, you know, when you're in a monolith situation, If there's an issue, you know, where it is, it's in the monolith. You might not know where in the monolith. Um, but like there's only one place that could be. And so an issue that one has, uh, when you break things up into smaller units, uh, especially when they have this, you know, shared, shared mutable state, essentially in the form of these databases, like who changed that column?What, you know, what's the deal. Uh, actually we did have a solution for that or something that really helped us, which was, um, now 20, more than 20 years ago, we had something that we would now call distributed tracing where, uh, actually I talked about this way back in the 2007 thing, cause it was pretty cool, uh, at the time, uh, You know, just like the spans one would create using a modern distributed tracing, you know, open telemetry or, you know, any of the disruptive tracing vendors.Um, just like you would do that. We, we didn't use the term span, but that same idea where, um, we could, and the goal was the same to like debug stuff. So, uh, every time we were about to make a database call, we would say, Hey, I'm about to make this data, you know, we would log we about to make this database call and then it would happen.And then we would log whether it was successful or not successful. We could see how long it took, et cetera. Um, and so we built our own, you know, monitoring system, which, which we called central application logging or CAL, uh, totally proprietary to eBay. I'm happy to talk about whatever gory details you want to know about that, but it was pretty cool certainly way back in 2000.It was, and that was our mitigation against the thing I'm telling you, which is, you know, when something, when not. Something is weird in the database. We can kind of back up and figure out where it might've happened, or things are slow. What's, you know, what's the deal. And, uh, you know, cause sometimes the database is slow for reasons.Um, and what, which, what thing is, you know, from an application perspective, I'm talking to 20 different databases, but things are slow. Like what is it? And, um, CAL helped us to, to figure out both elements of that, right? Like what applications are talking to, what databases and what backend services and like debug and diagnose from that perspective.And then for a given application, what, you know, databases in backend services are you talking to? And, um, debug that. And then we have the whole, and then we, um, we, we had monitors on those things and we would notice when databases would, where be a lot of errors or where, when database is starting in slower than they used to be.Um, and then. We implemented what people would now call circuit breakers, where we would notice that, oh, you know, everybody who's trying to talk to database 1, 2, 3, 4 is seeing it slow down. I guess 1, 2, 3, 4 is unhappy. So now flip everybody to say, don't talk to 1, 2, 3, 4, and like, just that kind of stuff.You're not going to be able to serve. Uh, but whatever, that's better than stopping everything. So I hope that makes sense. Like, you know, so all these, all these like modern resilience techniques, um, we always had, we had our own proprietary names for them, but you know, we, we implemented a lot of them way back when,[00:15:22] Jeremy: Yeah. And, and I guess just to contextualize it for the audience, I mean, this was back in 2004. Oh it back in 2000.[00:15:32] Randy: Again, because we had this, sorry to interrupt you because we have, the problem is that we were just talking about where application many applications are talking to many services and databases and we didn't know what was going on. And so we needed some visibility into what was going on.Sorry, go ahead.[00:15:48] Jeremy: yeah. Okay. So all the way back in 2000, there's a lot less, Services out there, like nowadays you think about so many software as a service products. if you were building the same thing today, what are some of the services that people today would just go and say like, oh, I'll just, I'll just pay for this and have this company handle it for me. You know, that wasn't available, then[00:16:10] Randy: sure. Well, there. No, essentially, no. Well, there was no cloud cloud didn't happen until 2006. Um, and there were a few software as a service vendors like Salesforce existed at the time, but they weren't usable in the way you're thinking of where I could give you money and you would operate a technical or technological software service on my behalf.Do you know what I mean? So we didn't have any of the monitoring vendors. We didn't have any of the stuff today. So yeah. So what would we do, you know, to solve that specific problem today? Uh, I would, as we do today, I would, uh, instrument everything with open telemetry because that's generic. Thank you, Ben Siegelman and LightStep for starting that whole open sourcing process, uh, of that thing and, and, um, getting all the vendors to, you know, respect it.Um, and then I would shoot, you know, for my backend, I would choose one of the very many wonderful, uh, you know, uh, distributed tracing vendors of which there are so many, I can't remember, but like LightStep is one honeycomb... you know, there were a bunch of, uh, you know, backend, um, distributed tracing vendors in particular, you know, for that.Uh, what else do you have today? I mean, we could go on for hours on this one, but like, we didn't have distributed logging or we didn't have like logging vendors, you know? So there was no, uh, there was no Splunk, there was no, um, you know, any, any of those, uh, any of the many, uh, distributed log, uh, or centralized logging vendor, uh, vendors.So we didn't have any of those things. We didn't. like caveman, you know, we rent, we, uh, you know, had our own data. We built our own data centers. We racked our own servers. We installed all the OSS in them, you know, uh, by the way, we still do all that because it's way cheaper for us at our scale to do that.But happy to talk about that too. Uh, anyway, but yeah, no, the people who live in, I don't know if this is where you want to go in 2022, the software developer has this massive menu of options. You know, if you only have a credit card, uh, and it doesn't usually cost that much, you can get a lot of stuff done from the cloud vendors, from the software service vendors, et cetera, et cetera.And none of that existed in 2000.[00:18:31] Jeremy: it's really interesting to think about how different, I guess the development world is now. Like, cause you mentioned how cloud wasn't even really a thing until 2006, all these, these vendors that people take for granted. Um, none of them existed. And so it just, uh, it must've been a very, very different time.[00:18:52] Randy: Well, we didn't know. It was every, every year is better than the previous year, you know, in software every year. You know? So at that time we were really excited that we had all the tools and capabilities that, that we did have. Uh, and also, you know, you look back from, you know, 20 years in the future and, uh, you know, it looks caveman, you know, from that perspective.But, uh, it was, you know, all those things were cutting edge at the time. What happened really was the big companies rolled their own, right. Everybody, you know, everybody built their own data centers, rack their own servers. Um, so at least at scale and the best you could hope for the most you could pay anybody else to do is rack your servers for you.You know what I mean? Like there were external people, you know, and they still exist. A lot of them, you know, the Rackspaces you know Equinixes, et cetera of the world. Like they would. Have a co-location facility. Uh, and you, you know, you ask them please, you know, I'd like to buy the, these specific machines and please rack these specific machines for me and connect them up on the network in this particular way.Um, that was the thing you could pay for. Um, but you pretty much couldn't pay them to put software on there for you. That was your job. Um, and then operating. It was also your job, if that makes sense.[00:20:06] Jeremy: and then back then, would that be where. Employees would actually have to go to the data center and then, you know, put in their, their windows CD or their Linux CD and, you know, actually do everything right there.[00:20:18] Randy: Yeah. 100%. Yeah. In fact, um, again, anybody who operates data centers, I mean, there's more automation, but the conceptually, when we run three data centers ourselves at eBay right now, um, and all of our, all of our software runs on them. So like we have physical, we have those physical data centers. We have employees that, uh, physically work in those things, physical.Rack and stack the servers again, we're smarter about it now. Like we buy a whole rack, we roll the whole rack in and cable it, you know, with one big chunk, uh, sound, uh, as distinct from, you know, individual wiring and the networks are different and better. So there's a lot less like individual stuff, but you know, at the end of the day, but yeah, everybody in quotes, everybody at that time was doing that or paying somebody to do exactly that.Right. Yeah.[00:21:05] Jeremy: Yeah. And it's, it's interesting too, that you mentioned that it's still being done by eBay. You said you have three, three data centers. because it seems like now maybe it's just assumed that someone's using a cloud service or using AWS or whatnot. And so, oh, go ahead.[00:21:23] Randy: I was just going to say, well, I'm just going to riff off what you said, how the world has changed. I mean, so much, right? So. Uh, it's fine. You didn't need to say my whole LinkedIn, but like I used to work on Google cloud. So I've been, uh, I've been a cloud vendor, uh, at a bunch of previous companies I've been a cloud consumer, uh, at stitch fix and we work in other places.Um, so I'm fully aware, you know, fully, fully, personally aware of, of all that stuff. But yeah, I mean, there's this, um, you know, eBay is in the, uh, eBay is at the size where it is actually. Cost-effective very, cost-effective, uh, can't tell you more than that, uh, for us to operate our own, um, uh, our own infrastructure, right?So, you know, you know, one would expect if Google didn't operate their own infrastructure, nobody would expect Google to use somebody else's right. Like that, that doesn't make any economic sense. Um, and, uh, you know, Facebook is in the same category. Uh, for a while, Twitter and PayPal have been in that category.So there's like this clap, you know, there are the known hyperscalers, right. You know, the, the Google, Amazon, uh, Microsoft that are like cloud vendors in addition to consumers internally have their own, their own clouds. Um, and then there's a whole class of other, um, places that operate their own internal clouds in quotes.Uh, but don't offer them externally and again, uh, Facebook or Meta, uh, you know, is one example. eBay's another, you know, there's a, I'm making this up. Dropbox actually famously started in the cloud and then found it was much cheaper for them to operate their own infrastructure again, for the particular workloads that they had.Um, so yeah, there's probably, I'm making this up. Let's call it two dozen around the world of these, I'm making this term up many hyperscalers, right? Like self hyperscalers or something like that. And eBay's in that category.[00:23:11] Jeremy: I know this is kind of a, you know, a big what if, but you were saying how once you reach a certain scale, that's when it makes sense to move into your own data center. And, uh, I'm wondering if, if E-bay, had started more recently, like, let's say in the last, you know, 10 years, I wonder if it would've made sense for it to start on a public cloud and then move to, um, you know, its own infrastructure after it got bigger, or if you know, it really did make sense to just start with your own infrastructure from the start.[00:23:44] Randy: Oh, I'm so glad you asked that. Um, the, the answer is obvious, but like, I'm so glad you asked that because I love to make this point. No one should ever, ever start by building your own servers and your own (laughs) cloud. Like, No, there's be, uh, you should be so lucky (laughs) after years and years and years that you outgrow the cloud vendors.Right. Um, it happens, but it doesn't happen that often, you know, it happens so rarely that people write articles about it when it happens. Do you know what I mean? Like Dropbox is a good example. So yes, 100% anytime. Where are we? 2022. Any time in, more than the last 10 years? Um, yeah, let's call it. Let's call it 2010, 2012.Right. Um, when cloud had proved itself over and you know, many times over, um, anybody who starts since that time should absolutely start in the public cloud. There's no argument about it. Uh, and again, one should be so lucky that over time, you're seeing successive zeros added to your cloud bill, and it becomes so many zeros that it makes sense to shift your focus toward building and operating your own data centers.That's it. I haven't been part of that transition. I've been the other way, you know, at other places where, you know, I've migrated from owned data centers and colos into, into public cloud. Um, and that's the, that's the more common migration. And again, there are, there are a handful, maybe not even a handful of, uh, companies that have migrated away, but when they do, they've done all the math, right.I mean, uh, Dropbox has done some great, uh, talks and articles about, about their transition and boy, the math makes sense for them. So, yeah.[00:25:30] Jeremy: Yeah. And it also seems like maybe it's for certain types of businesses where moving off of public cloud. Makes sense. Like you mentioned Dropbox where so much of their business is probably centered around storage or centered around, you know, bandwidth and, you know, there's probably certain workloads that it's like need to leave public cloud earlier.[00:25:51] Randy: Um, yeah, I think that's fair. Um, I think that, I think that's a, I think that's an insightful comment. Again, it's all about the economics at some point, you know, it's a big investment to, uh, uh, and it takes years to develop the intern, forget the money that you're paying people, but like just to develop the internal capabilities.So they're very specialized skill sets around building an operating data centers. So like it's a big deal. Um, and, uh, yeah. So are there particular classes of workloads where you would for the same dollar figure or whatever, uh, migrate earlier or later? I'm sure that's probably true. And again, what can absolutely imagine?Well, when they say Dropbox in this example, um, yeah, it's because like they, they need to go direct to the storage. And then, I mean, like, they want to remove every middle person, you know, from the flow of the bytes that are coming into the storage media. Um, and it makes perfect sense for, for them. And when I understood what they were doing, which was a number of years ago, they were hybrid, right. So they had, they had completely, you know, they kept the top, you know, external layer, uh, in public cloud. And then the, the storage layer was all custom. I don't know what they do today, but people could check.[00:27:07] Jeremy: And I'm kind of coming back to your, your first time at eBay. is there anything you felt that you would've done differently with the knowledge you have now?but with the technology that existed, then.[00:27:25] Randy: Gosh, that's the 20, 20 hindsight. Um, the one that comes to mind is the one we touched on a little bit, but I'll say it more starkly, the. If I could, if I could go back in time 20 years and say, Hey, we're about to do this V3 transition at eBay. I would not. I would have had us move directly to what we would now call microservices in the sense that individual services own their own data storage and are only interacted with through the public interface.Um, there's a famous Amazon memo around that same time. So Amazon did the transition from a monolith into what we would now call microservices over about a four or five-year period, 2000 to 2005. And there was a famous Jeff Bezos memo from the early part of that, where, you know, seven, you know, requirements I can't remember them, but you know, essentially it was, you may, you may, you may never, you may never talk to anybody else's database. You may only interact with other services through their public interfaces. I don't care what those public interfaces are, so they didn't standardize around. You know, CORBA or JSON or GRPC, which didn't exist at the time, you know, like they didn't standardize around any, any particular, uh, interaction mechanism, but you did need to again, have this kind of microservice capability, that's modern terminology, um, uh, where, you know, the only services own their own data and nobody can talk in the back door.So that is the one architectural thing that I wish, you know, with 2020 hindsight, uh, that I would bring back in my time travel to 20 years ago, because that would help. That does help a lot. And to be fair, Amazon, um, Amazon was, um, pioneering in that approach and a lot of people internally and externally from Amazon, I'm told, didn't think it would work, uh, and it, and it did famously.So that's, that's the thing I would do.[00:29:30] Jeremy: Yeah. I'm glad you brought that up because, when you had mentioned that, I think you said there were 220 applications or something like that at certain scales, people might think like, oh, that sounds like microservices to me. But when you, you mentioned that microservice to you means it having its own data store.I think that's a good distinction.[00:29:52] Randy: Yeah. So, um, I talk a lot about microservices that have for, for a decade or so. Yeah. I mean, several of the distinguishing characteristics are the micro in microservices is size and scope of the interface, right? So you can have a service oriented architecture with one big service, um, or some very small number of very large services.But the micro in microservice means this thing does, maybe it doesn't have one operation, but it doesn't have a thousand. The several or the handful or several handfuls of operations are all about this one particular thing. So that's the one part of it. And then the other part of it that is critical to the success of that is owning the, owning your own data storage.Um, so each service, you know, again, uh, it's hard to do this with a diagram, but like imagine, imagine the bubble of the service surrounding the data storage, right? So like people, anybody from the outside, whether they're interacting synchronously, asynchronously, messaging, synchronous, whatever HTTP doesn't matter are only interacting to the bubble and never getting inside where the, uh, where the data is I hope that makes sense.[00:31:04] Jeremy: Yeah. I mean, I mean, it's a kind of in direct contrast to before you're talking about how you had all these databases that all of these services shared. So it was probably hard to kind of keep track of, um, who had modified data. Um, you know, one service could modify it, then another service control to get data out and it's been changed, but it didn't change it.So it could be kind of hard to track what's going on.[00:31:28] Randy: Yeah, exactly. Inner integration at the database level is something that people have been doing since probably the 1980s. Um, and so again, I, you know, in retrospect it looks like caveman approach. Uh, it was pretty advanced at the time, actually, even the idea of sharding of, you know, Hey, there are users and the users live in databases, but they don't all live in the same one.Uh, they live in 10 different databases or 20 different databases. And then there's this layer that. For this particular user, it figures out which of the 20 databases it's in and finds it and gets it back. And, um, you know, that was all pretty advanced. And by the way, that's all those capabilities still exist.They're just hidden from everybody behind, you know, nice, simple, uh, software as a service, uh, interfaces anyway, but that takes nothing away from your excellent point, which is, yeah. It's, you know, when you're, again, when there's many to many to relations, when there is this many to many relationship between, um, uh, applications and databases, uh, and there's shared mutable state in those databases that when is shared, like that's bad, you know, it's not bad to have state.It's not bad to have mutable state it's bad to have shared beautiful state.[00:32:41] Jeremy: Yeah. And I think anybody who's kind of interested in learning more about the, you had talked about sharding and things like that. If they go back and listen to your, your first appearance on software engineering radio, um, yeah. It kind of struck me how you were talking about sharding and how it was something that was kind of unique or unusual.Whereas today it feels like it's very, I don't know, if quaint is the right word, but it's like, um, it's something that, that people kind of are accustomed to now.[00:33:09] Randy: Yeah. Yeah. Um, it's obvious. Um, it seems obvious in retrospect. Yeah. You know, at the time, and by the way, he didn't invent charting. As I said, in 2007, you know, Google and Yahoo and, uh, Amazon, and, you know, it was the obvious, it took a while to reach it, but it's one of those things where once, once people have the, you know, brainwave to see, oh, you know what, we don't actually have to stop store this in one, uh, database.We can, we can chop that database up into, you know, into chunks. And that, that looks similar to that herself similar. Um, yeah, that was, uh, that was, uh, that was reinvented by lots of, uh, Lots of the big companies at the same time again, because everybody was solving that same problem at the same time. Um, but yeah, when you look back and you, I mean, like, and honestly, like everything that I said there, it's still like this, all the techniques about how you shard things.And there's lots of, you know, it's not interesting anymore because the problems have been solved, but all those solutions are still the solutions, if that makes any sense, but you know,[00:34:09] Jeremy: Yeah, for sure. I mean, I think anybody who goes back and listens to it. Yeah. Like you said, it's, it's, it's very interesting because it's. it all still applies and it's like, I think the, the solutions that are kind of interesting to me are ones where it's, it's things that could have been implemented long ago, but we just later on realized like, this is how we could do it.[00:34:31] Randy: Well part of it is, as we grow as an industry, we just, we discover new problems. You know, we, we get to the point where, you know, sharding over databases has only a problem when one database doesn't work. You know, when it, when you're the load that you put on that database is too big, or you want the availability of, you know, multiple.Um, and so that's not a, that's not a day one problem, right? That's a day two or day 2000 and kind of problem. Right. Um, and so a lot of these things, yeah, well, you know, it's software. So like we could have done, we could have done any of these things in older languages and older operating systems and with older technology.But for the most part, we didn't have those problems or we didn't have them at sufficiently enough. People didn't have the problem that we, you know, um, for us to have solved it as an industry, if that makes any sense.[00:35:30] Jeremy: yeah, no, that's a good point because you think about when Amazon first started and it was just a bookstore, right. And the number of people using the site where, uh, who knows it was, it might've been tens a day or hundreds a day. I don't, I don't know. And, and so, like you said, the problems that Amazon has now in terms of scale are just like, it's a completely different world than when they started.[00:35:52] Randy: Yeah. I mean, probably I'm making it up, but I don't think that's too off to say that it's a billion times more, their problems are a billion fold. You know, what they, what they were[00:36:05] Jeremy: the next thing I'd like to talk about is you came back to eBay I think about has it been about two years ago.[00:36:14] Randy: Two years yeah.[00:36:15] Jeremy: Yeah. And, and so, so tell me about the experience of coming back to an organization that you had been at, you know, 10 years prior or however long it was like, how is your onboarding different when it's somewhere you've been before?[00:36:31] Randy: Yeah. Sure. So, um, like, like you said, I worked at eBay from 2004 to 2011. Um, and I worked in a different role than I have today. I've worked mostly on eBay search engine. Um, and then, uh, I left to co-found a startup, which was in the 99%. So the one, you know, like didn't really do much. Uh, I joined, I worked at Google in the early days of Google cloud, as I mentioned on Google app engine and had a bunch of other roles including more recently, like you said, stitch fix and we work, um, leading those engineering teams.And, um, so yeah, coming back to eBay as chief architect and, and, you know, leading. Developer platform, essentially a part of eBay. Um, yeah. What was the onboarding like? I mean, lots of things had changed, you know, in the, in the intervening 10 years or so. Uh, and lots had stayed the same, you know, not in a bad way, but just, you know, uh, some of the technologies that we use today are still some of the technologies we used 10 years ago, a lot has changed though.Um, a bunch of the people are still around. So there's something about eBay that, um, people tend to stay a long time. You know, it's not really very strange for people to be at eBay for 20 years. Um, in my particular team of let's call it 150, there are four or five people that have crossed their 20 year anniversary at the company.Um, and I also re I rejoined with a bunch of other boomerangs as the term we use internally. So it's, you know, the, um, including the CEO, by the way. So sort of bringing the band back together, a bunch of people that had gone off and worked at it, but at other places have, have come back for various reasons over the last couple of.So it was both a lot of familiarity, a lot of unfamiliarity, a lot of familiar faces. Um, yup.[00:38:17] Jeremy: So, I mean, having these people who you work with still be there and actually coming back with some of those people, um, what were some of the big, I guess, advantages or benefits you got from, you know, those existing connections?[00:38:33] Randy: Yeah. Well, I mean, as with all things, you know, imagine, I mean, everybody can imagine like getting back together with friends that they had from high school or university, or like you had some people had some schooling at some point and like you get back together with those friends and there's this, you know, there's this implicit trust in most situations of, you know, because you went through a bunch of stuff together and you knew each other, uh, you know, a long time.And so that definitely helps, you know, when you're returning to a place where again, there are a lot of familiar faces where there's a lot of trust built up. Um, and then it's also helpful, you know, eBay's a pretty complicated place and it's 10 years ago, it was too big to hold in any one person's head and it's even harder to hold it in one person said now, but to be able to come back and have a little bit of that, well, more than a little bit of that context about, okay, here's how eBay works.And here, you know, here are the, you know, unique complexities of the marketplace cause it's very unique, you know, um, uh, in the world. Um, and so, yeah, no, I mean, it was helpful. It's helpful a lot. And then also, you know, in my current role, um, uh, my, my main goal actually is to just make all of eBay better, you know, so we have about 4,000 engineers and, you know, my team's job is to make all of them better and more productive and more successful and, uh, being able to combine.Knowing what eBay, knowing the context about eBay and having a bunch of connections to the people that, you know, a bunch of the leaders there, uh, here, um, combining that with 10 years of experience doing other things at other places, you know, that's helpful because you know, now there are things that we do at eBay that, okay, well there, you know, you know, that this other place is doing, this has that same problem and is solving it in a different way.And so maybe we should, you know, look into that option. So,[00:40:19] Jeremy: so, so you mentioned just trying to make developers, work or lives easier. you start the job. How do you decide what to tackle first? Like how do you figure out where the problems are or what to do next?[00:40:32] Randy: yeah, that's a great question. Um, so, uh, again, my, uh, I lead this thing that we internally called the velocity initiative, which is about just making us, giving us the ability to deliver. Features and bug fixes more quickly to customers. Right. And, um, so what do I figure for that problem? How can we deliver things more quickly to customers and improve, you know, get more customer value and business value?Uh, what I did, uh, with, in collaboration with a bunch of people is what one would call a value stream map. And that's a term from lean software and lean manufacturing, where you just look end to end at a process and like say all the steps and how long those steps take. So a value stream, as you can imagine, like all these steps that are happening at the end, there's some value, right?Like we produced some, you know, feature or, you know, hopefully gotten some revenue or like helped out the customer and the business in some way. And so value, you know, mapping that value stream. That's what it means. And, um, Looking for you look at that. And when you can see the end-to-end process, you know, and like really see it in some kind of diagram, uh, you can look for opportunities like, oh, okay, well, you know, if it takes us, I'm making this effort, it takes us a week from when we have an idea to when it shows up on the site.Well, you know, some of those steps take five minutes. That's not worth optimizing, but some of those steps take, you know, five days and that is worth optimizing. And so, um, getting some visibility into the system, you know, looking end to end with some, with a kind of view of the system systems thinking, uh, that will give you the, uh, the knowledge about, or the opportunities about we know what can be improved.And so that's, that's what we did. And we didn't talk with all 4,000, you know, uh, engineers are all, you know, whatever, half a thousand teams or whatever we had. Um, but we sampled. And after we talked with three teams who were already hearing a bunch of the same things, you know, so we were hearing in the whole product life cycle, which I like to divide into four stages.I'd like to say, there's planning. How does an idea become a project or a thing that people work on a software development? How does a project or become committed code software delivery? How does committed code become a feature that people actually use? And then what I call post release iteration, which is okay, it's now there are out there on the site and we're turning it on and off for individual users.We're learning in analytics and usage in the real world and, and experimenting. And so there were opportunities that eBay at all, four of those stages, um, which I'm happy to talk about, but what we ended up seeing again and again, uh, is that that software delivery part was our current bottleneck. So again, that's the, how long does it take from an engineer when she commits her code to, it shows up as a feature on the site.And, you know, before we started the work. You know, two years ago before we started the work that I've been doing for the last two years with a bunch of people, um, on average and eBay was like a week and a half. So, you know, it'd be a week and a half between when someone's finished and then, okay. It gets code reviewed and, you know, dot, dot, dot it gets rolled out.It gets tested, you know, all that stuff. Um, it was, you know, essentially 10 days. And now for the teams that we've been working with, uh, it's down to two. So we used a lot of, um, what people may be familiar with, uh, the accelerate book. So it's called accelerate by Nicole Forsgren. Um, Jez humble and Gene Kim, uh, 2018, like if there's one book anybody should read about software engineering, it's that?Uh, so please read accelerate. Um, it summarizes almost a decade of research from the state of DevOps reports, um, which the three people that I mentioned led. So Nicole Forsgren, you know, is, uh, is a doctor, uh, you know, she's a PhD and, uh, data science. She knows how to do all this stuff. Um, anyway, so, uh, that when your, when your problem happens to be software delivery.The accelerate book tells you all the kind of continuous delivery techniques, trunk based development, uh, all sorts of stuff that you can do to, to solve that, uh, solve those problems. And then there are also four metrics that they use to measure the effectiveness of an organization, software delivery. So people might be familiar with, uh, there's deployment frequency.How often are we deploying a particular application lead time for change? That's that time from when a developer commits her code to when it shows up on the site, uh, change failure rate, which is when we deploy code, how often do we roll it back or hot fix it, or, you know, there's some problem that we need to, you know, address.Um, and then, uh, meantime to re uh, meantime to restore, which is when we have one of those incidents or problems, how, how quickly can we, uh, roll it back or do that hot fix? Um, and again, the beauty of Nicole Forsgren research summarized in the accelerate book is that the science shows that companies cluster, in other words, Mostly the organizations that are not good at, you know, deployment frequency and lead time are also not good at the quality metrics of, uh, meantime to restore and change failure rate and the companies that are excellent at, you know, uh, deployment frequency and lead time are also excellent at meantime, to recover and, uh, change failure rate.Um, so companies or organizations, uh, divided into these four categories. So there's a low performers, medium performers, high performers, and then elite performers. And, uh, eBay was solidly eBay on average at the time. And still on average is solidly in that medium performer category. So, uh, and what we've been able to do with the teams that we've been working with is we've been able to move those teams to the high category.So just super brief. Uh, and I w we'll give you a chance to ask you some more questions, but like in the low category, all those things are kind of measured in months, right. So how long, how often are we deploying, you know, measure that in months? How long does it take us to get a commit to the site? You know, measure that in months, you know, um, where, and then the low performer, sorry.Uh, the medium performers are like, everything's measured in weeks, right? So like, if we were deploy, you know, couple, you know, once every couple of weeks or once a week, uh, lead time is measured in weeks, et cetera. The, uh, the high-performers things are measured in days and the elite performance things are measured in hours.And so you can see there's like order of magnitude improvements when you go from, you know, when you move from one of those kind of clusters to another cluster. Anyway. So what we were focused on again, because our problem was software delivery was moving a whole, a whole set of teams from that medium performer category where things are measured in weeks to the, uh, high-performer category, where things are missing.[00:47:21] Jeremy: throughout all this, you said the, the big thing that you focused on was the delivery time. So somebody wrote code and, they felt that it was ready for deployment, but for some reason it took 10 days to actually get out to the actual site. So I wonder if you could talk a little bit about, uh, maybe a specific team or a specific application, where, where, where was that time being spent?You know, you, you said you moved from 10 days to two days. What, what was happening in the meantime?[00:47:49] Randy: Yeah, no, that's a great question. Thank you. Um, yeah, so, uh, okay, so now, so we, we, we looked end to end of the process and we found that software delivery was the first place to focus, and then there are other issues in other areas, but we'll get to them later. Um, so then for, um, to improve software delivery, now we asked individual teams, we, we, we did something like, um, you know, some conversation like I'm about to say, so we said, hi, it looks like you're deploying kind of once or twice.If I, if I told you, you had to deploy once a day, tell me all the reasons why that's not going to work. And the teams are like, oh, of course, well, it's a build times take too long. And the deployments aren't automated and you know, our testing is flaky. So we have to retry it all the time and, you know, dot, dot, dot, dot, dot.And we said, great, you just gave my team, our backlog. Right. So rather than, you know, just coming and like, let's complain about it. Um, which the teams work it's legit for them to complain. Uh, I was a, you know, we were able, because again, the developer program or sorry, the developer platform, you know, is as part of my team, uh, we said, great, like you just gave us, you just told us all the, all your top, uh, issues or your impediments, as we say, um, and we're going to work on them with you.And so every time we had some idea about, well, I bet we can use Canary deployments to automate the deployment which we have now done. We would pilot that with a bunch of teams, we'd learn what works and doesn't work. And then we would roll that out to everybody. Um, So what were the impediments like? It was a little bit different for each individual team, but in some, it was, uh, the things we ended up focusing on or have been focusing on our build times, you know, so we build everything in Java still.Um, and, uh, even though we're generation five, as opposed to that generation three that I mentioned, um, still build times for a lot of applications we're taking way too long. And so we, we spend a bunch of time improving those things and we were able to take stuff from, you know, hours down to, you know, single digit minutes.So that's a huge improvement to developer productivity. Um, we made a lot of investment in our continuous delivery pipelines. Um, so making all the, making all the automation around, you know, deploying something to one environment and checking it there and then deploying it into a common staging environment and checking it there and then deploying it from there into the production environment.And, um, and then, you know, rolling it out via this Canary mechanism. We invested a lot in something that we call traffic mirroring, which is a, we didn't invent. Other T other places have a different name for this? I don't know if there's a standard industry name. Some people call it shadowing, but the idea is I have a change that I'm making, which is not intended to change the behavior.Like a lots of changes that we make, bug fixes, et cetera, uh, upgrading to new, you know, open source, dependencies, whatever, changing the version of the framework. There's a bunch of changes that we make regularly day-to-day as developers, which are like, refactorings kind of where we're not actually intending to change the behavior.And so a tra traffic mirroring was our idea of. You have the old code that's running in production and you, and you fire a request, a production request at that old code and it responds, but then you also fire that request at the new version and compare the results, you know, did the same, Jason come back, you know, between the old version and the new version.Um, and that's, that's a great way kind of from the outside to sort of black box detect any unintended changes in the, in the behavior. And so we definitely leveraged that very, very aggressively. Um, we've invested in a bunch of other bunch of other things, but, but all those investments are driven by what does the team, what do the particular teams tell us are getting in their way?And there are a bunch of things that the teams themselves have, you know, been motivated to do. So my team's not the only one that's making improvements. You know, teams have. Reoriented, uh, moved, moved from branching development to trunk based development, which makes a big difference. Um, making sure that, uh, PR approvals and like, um, you know, code reviews are happening much more regularly.So like right after, you know, a thing that some teams have started doing is like immediately after standup in the morning, everybody does all the code reviews that you know, are waiting. And so things don't drag on for, you know, two, three days, cause whatever. Um, so there's just like a, you know, everybody kind of works on that much more quickly.Um, teams are building their own automations for things like testing site speed and accessibility and all sorts of stuff. So like all the, all the things that, you know, a team goes through in the development and roll out of their software, they were been spending a lot of time automating and making, making a leaner, making more efficient.[00:52:22] Jeremy: So, so some of those, it sounds like the PR example is really, on the team. Like you're, you're telling them like, Hey, this is something that you internally should change how you work. for things like improving the build time and things like that. Did you have like a separate team that was helping these teams, you know, speed that process up? Or what, what was that [00:52:46] Randy: like?Yeah. Great. I mean, and you did give to those two examples are, are like you say, very different. So I'm going to start from, we just simply showed everybody. Here's your deployment frequency for this application? Here's your lead time for this application? Here's your change failure rate. And here's your meantime to restore.And again, as I didn't mention before. All of the state of DevOps research and the accelerate book prove that by improving those metrics, you get better engineering outcomes and you also get better business outcomes. So like it's scientifically proven that improving those four things matters. Okay. So now we've shown to teams, Hey, you're we would like you to improve, you know, for your own good, but you know, more broadly at eBay, we would like the deployment frequency to be faster.And we would like the lead time to be shorter. And the insight there is when we deploy smaller units of work, when we don't like batch up a week's worth of work, a month's worth of work, uh, it's much, much less risky to just deploy like an hour's worth of work. Right. And the, and the insight is the hours worth of work fits in your head.And if you roll it out and there's an issue. First off rolling backs, no big deal. Cause you only, you know, not, you've only lost an hour of work for a temporary period of time, but also like you never have this thing, like what in the world broke? Cause like with a month's worth of work, there's a lot of things that changed and a lot of stuff that could break, but with an hour's worth of work, it's only like one change that you made.So, you know, when, if something happens, like it's pretty much, pretty much guaranteed to be that thing anyway, that's the back. Uh, that's the backstory. And um, and so yeah, we were just working with individual teams. Oh yeah. So they were, the teams were motivated to like, see what's the biggest bang for the buck in order to improve those things.Like how can we improve those things? And again, some teams were saying, well, you know what, a huge component of our, of that lead time between when somebody commits and it's, it's a feature on the site, a huge percentage of that. Maybe multiple days, it's like waiting for somebody to code review. Okay, great.We can just change our team kind of agreements and our team behavior to make that happen. And then yes, to answer your question about. Were the other things like building the Canary capability and traffic mirroring and build time improvements. Those were done by central, uh, platform and infrastructure teams, you know, some of which were in my group and some of which are in peer peer groups, uh, in, in my part of the organization.So, yeah, so I mean like providing the generic tools and, you know, generic capabilities, those are absolutely things that a platform organization does. Like that's our job. Um, and you know, we did it. And, uh, and then there are a bunch of other things like that around kind of team behavior and how you approach building a particular application that are, are, and should be completely in the control of the individual teams.And we were trying not to be, not trying not to be, we were definitely not being super prescriptive. Like we didn't come in and we say, we didn't come in and say, alright, by next, by next Tuesday, we want you to be doing trunk based development by, you know, the Tuesday after that, we want to see test-driven development, you know, dot, dot, Um, we would just offer to teams, you know, hear it.Here's where you are. Here's where we know you can get, because like we work with other teams and we've seen that they can get there. Um, you know, they just work together on, well, what's the biggest bang for the buck and what would be most helpful for that team? So it's like a menu of options and you don't have to take everything off the menu, if that makes sense.[00:56:10] Jeremy: And, and how did that communication flow from you and your team down to the individual contributor? Like you have, I'm assuming you have engineering managers and technical leads and all these people sort of in the chain. How does it[00:56:24] Randy: Yeah, thanks for asking that. Yeah. I didn't really say how we work as an initiative. So every, um, so there are a bunch of teams that are involved. Um, and we have, uh, every Monday morning, so, uh, just so happens. It's late Monday morning today. So we already did this a couple of hours ago, but once a week we get all the teams that are involved, both like the platform kind of provider teams and also the product.Or we would say domain like consumer teams. And we do a quick scrum of scrums, like a big old kind of stand up. What have you all done this week? What are you working on next week? What are you blocked by kind of idea. And, you know, there are probably 20 or 30 teams again, across the individual platform capabilities and across the teams that, you know, uh, consume this stuff and everybody gives a quick update and they, and, uh, it's a great opportunity for people to say, oh, I have that same problem too.Maybe we should offline try to figure out how to solve that together. You built a tool that automates the site speed stuff. That's great. I would S I would so love to have that. And, um, so it, uh, this weekly meeting has been a great opportunity for us to share wins, share, um, you know, help that people need and then get, uh, get teams to help with each other.And also, similarly, one of the platform teams would say something like, Hey, we're about to be done or beta, let's say, you know, this new Canary capability, I'm making this up. Anybody wanna pilot that for us? And then you get a bunch of hands raised of yo, we would be very happy to pilot that that would be great.Um, so that's how we communicate back and forth. And, you know, it's a big enough. It's kind of like engineering managers are kind of are the kind of level that are involved in that typically. Um, so it's not individual developers, but it's like somebody on most, every team, if that makes any sense. So like, that's kind of how we do that, that like communication, uh, back to the individual developers.If that makes sense.[00:58:26] Jeremy: Yeah. So it sounds like you would have, like you said, the engineering manager go to the standup and um, you said maybe 20 to 30 teams, or like, I'm just trying to get a picture for how many people are in this meeting.[00:58:39] Randy: Yeah. It's like 30 or 40 people.[00:58:41] Jeremy: Okay. Yeah.[00:58:42] Randy: And again, it's quick, right? It's an hour. So we just go, boom, boom, boom, boom. And we've just developed a cadence of people. We have a shared Google doc and like people like write their little summaries, you know, of what they're, what they've worked on and what they're working on.So we've over time made it so that it's pretty efficient with people's time. And. Pretty dense in a good way of like information flow, back and forth. Um, and then also separately, we meet more in more detail with the individual teams that are involved. Again, try to elicit, okay, now, where are you now?Here's where you are. Please let us know what problems you're seeing with this part of the infrastructure or problems you're seeing in the pipelines or something like that. And we're, you know, we're constantly trying to learn and get better and, you know, solicit feedback from teams on what we can do differently.[00:59:29] Jeremy: earlier you had talked a little bit about how there were a few services that got brought over from V2 or V3, basically kind of more legacy or older services that are, have been a part of eBay for quite some time.And I was wondering if there were things about those services that made this process different, like, you know, in terms of how often you could deploy or, um, just what were some key differences between something that was made recently versus something that has been with the company for a long time?[01:00:06] Randy: Yeah, sure. I mean, the stuff that's been with the company for a long time was best in class. As of when we built it, you know, maybe 15 and sometimes 20 years ago. Um, there actually, I wouldn't even less than a handful. There are, as we speak, there are two or three of those V3. Uh, clusters or applications or services still around and they should be gone in a completely migrated away from, in the next a couple of months.So like, we're almost at the end of, um, you know, uh, moving all to more modern things. But yeah, you know, I mean, again, uh, stuff that was state-of-the-art, you know, 20 years ago, which was like deploying things once every two weeks, like that was a big deal in 2000 or 2004. Uh, and it's, you know, like that was fast in 2004 and is slow in 2022.So, um, yeah, I mean, what's the difference? Um, yeah, I mean, a lot of these things, you know, if they haven't already been migrated, there's a reason. And it's because often that they're way in the guts of something that's really important. You know, this is the, this is a core part. I'm making these examples up and they're not even right, but like it's a core part of the payments flow.It's a core part of, you know, uh, how, uh, sellers get paid. And those aren't examples. We have, those are modern, but you see what I'm saying? Like stuff that's like really core to the business and that's why it's kind of lasted.[01:01:34] Jeremy: And, uh, I'm kind of curious from the perspective of some of these new things you're introducing, like you're talking about, um, improving continuous delivery and things like that. Uh, when you're working with some of these services that have been around a long time, are the teams the rate at which they deploy or the rate at which you find defects is that noticeably different from services that are more recent?[01:02:04] Randy: I mean, and that's true of any legacy at any, at any place. Right? So, um, yeah, I mean, people are legitimately, uh, I have some trepidation that say about, you know, changing something that's
This episode originally aired on Software Engineering Radio.A few topics covered Building on top of open source Forking their GoTrue dependency Relying on Postgres features like row level security Adding realtime support based on Postgres's write ahead log Generating an API layer based on the database schema with PostgREST Creating separate EC2 instances for each customer's database How Postgres could scale in the future Monitoring postgres Common support tickets Permissive open source licenses Related Links @antwilson Supabase Supabase GitHub Firebase Airtable PostgREST GoTrue Elixir Prometheus VictoriaMetrics Logflare BigQuery Netlify Y Combinator Postgres PostgreSQL Write-Ahead Logging Row Security Policies pg_stat_statements pgAdmin PostGIS Amazon Aurora Transcript You can help edit this transcript on GitHub. [00:00:00] Jeremy: Today I'm talking to Ant Wilson, he's the co-founder and CTO of Supabase. Ant welcome to software engineering radio.[00:00:07] Ant: Thanks so much. Great to be here. [00:00:09] Jeremy: When I hear about Supabase, I always hear about it in relation to two other products. The first is Postgres, which is a open source relational database. And second is Firebase, which is a backend as a service product from Google cloud that provides a no SQL data store.It provides authentication and authorization. It has a functions as a service component. It's really meant to be a replacement for you needing to have your own server, create your own backend. You can have that all be done from Firebase. I think a good place for us to start would be walking us through what supabase is and how it relates to those two products.[00:00:55] Ant: Yeah. So, so we brand ourselves as the open source Firebase alternativethat came primarily from the fact that we ourselves do use the, as the alternative to Firebase. So, so my co-founder Paul in his previous startup was using fire store. And as they started to scale, they hit certain limitations, technical scaling limitations and he'd always been a huge Postgres fan.So we swapped it out for Postgres and then just started plugging in. The bits that we're missing, like the real-time streams. Um, He used the tool called PostgREST with a T for the, for the CRUD APIs. And sohe just built like the open source Firebase alternative on Postgres, And that's kind of where the tagline came from.But the main difference obviously is that it's relational database and not a no SQL database which means that it's not actually a drop-in replacement. But it does mean that it kind of opens the door to a lot more functionality actually. Um, Which, which is hopefully an advantage for us. [00:02:03] Jeremy: it's a, a hosted form of Postgres. So you mentioned that Firebase is, is different. It's uh NoSQL. People are putting in their, their JSON objects and things like that. So when people are working with Supabase is the experience of, is it just, I'm connecting to a Postgres database I'm writing SQL.And in that regard, it's kind of not really similar to Firebase at all. Is that, is that kind of right?[00:02:31] Ant: Yeah, I mean, the other thing, the other important thing to notice that you can communicate with Supabase directly from the client, which is what people love about fire base. You just like put the credentials on the client and you write some security rules, and then you just start sending your data. Obviously with supabase, you do need to create your schema because it's relational.But apart from that, the experience of client side development is very much the same or very similar the interface, obviously the API is a little bit different. But, but it's similar in that regard. But I, I think, like I said, we're moving, we are just a database company actually. And the tagline, just explained really, well, kind of the concept of, of what it is like a backend as a service. It has the real-time streams. It has the auth layer. It has the also generated APIs. So I don't know how long we'll stick with the tagline. I think we'll probably outgrow it at some point. Um, But it does do a good job of communicating roughly what the service is.[00:03:39] Jeremy: So when we talk about it being similar to Firebase, the part that's similar to fire base is that you could be a person building the front end part of the website, and you don't need to necessarily have a backend application because all of that could talk to supabase and supabase can handle the authentication, the real-time notifications all those sorts of things, similar to Firebase, where we're basically you only need to write the front end part, and then you have to know how to, to set up super base in this case.[00:04:14] Ant: Yeah, exactly. And some of the other, like we took w we love fire based, by the way. We're not building an alternative to try and destroy it. It's kind of like, we're just building the SQL alternative and we take a lot of inspiration from it. And the other thing we love is that you can administer your database from the browser.So you go into Firebase and you have the, you can see the object tree, and when you're in development, you can edit some of the documents in real time. And, and so we took that experience and effectively built like a spreadsheet view inside of our dashboard. And also obviously have a SQL editor in there as well.And trying to, create this, this like a similar developer experience, because that's where Firebase just excels is. The DX is incredible. And so we, we take a lot of inspiration from it in, in those respects.[00:05:08] Jeremy: and to to make it clear to our listeners as well. When you talk about this interface, that's kind of like a spreadsheet and things like that. I suppose it's similar to somebody opening up pgAdmin, I suppose, and going in and editing the rows. But, but maybe you've got like another layer on top that just makes it a little more user-friendly a little bit more like something you would get from Firebase, I guess.[00:05:33] Ant: Yeah.And, you know, we, we take a lot of inspiration from pgAdmin. PG admin is also open source. So I think we we've contributed a few things and, or trying to upstream a few things into PG admin. The other thing that we took a lot of inspiration from for the table editor, what we call it is airtable.And because airtable is effectively. a a relational database and that you can just come in and, you know, click to add your columns, click to add a new table. And so we just want to reproduce that experience again, backed up by a full Postgres dedicated database. [00:06:13] Jeremy: so when you're working with a Postgres database, normally you need some kind of layer in front of it, right? That the person can't open up their website and connect directly to Postgres from their browser. And you mentioned PostgREST before. I wonder if you could explain a little bit about what that is and how it works.[00:06:34] Ant: Yeah, definitely. so yeah, PostgREST has been around for a while. Um, It's basically an, a server that you connect to, to your Postgres database and it introspects your schemas and generates an API for you based on the table names, the column names. And then you can basically then communicate with your Postgres database via this restful API.So you can do pretty much, most of the filtering operations that you can do in SQL um, uh, equality filters. You can even do full text search over the API. So it just means that whenever you obviously add a new table or a new schema or a new column the API just updates instantly. So you, you don't have to worry about writing that, that middle layer which is, was always the drag right.When, what have you started a new project. It's like, okay, I've got my schema, I've got my client. Now I have to do all the connecting code in the middle of which is kind of, yeah, no, no developers should need to write that layer in 2022.[00:07:46] Jeremy: so this the layer you're referring to, when I think of a traditional. Web application. I think of having to write routes controllers and, and create this, this sort of structure where I know all the tables in my database, but the controllers I create may not map one to one with those tables. And so you mentioned a little bit about how PostgREST looks at the schema and starts to build an API automatically.And I wonder if you could explain a little bit about how it does those mappings or if you're writing those yourself. [00:08:21] Ant: Yeah, it basically does them automatically by default, it will, you know, map every table, every column. When you want to start restricting things. Well, there's two, there's two parts to this. There's one thing which I'm sure we'll get into, which is how is this secure since you are communicating direct from the client.But the other part is what you mentioned giving like a reduced view of a particular date, bit of data. And for that, we just use Postgres views. So you define a view which might be, you know it might have joins across a couple of different tables or it might just be a limited set of columns on one of your tables. And then you can choose to just expose that view. [00:09:05] Jeremy: so it sounds like when you would typically create a controller and create a route. Instead you create a view within your Postgres database and then PostgREST can take that view and create an end point for it, map it to that.[00:09:21] Ant: Yeah, exactly (laughs) . [00:09:24] Jeremy: And, and PostgREST is an open source project. Right. I wonder if you could talk a little bit about sort of what its its history was. How did you come to choose it? [00:09:37] Ant: Yeah.I think, I think Paul probably read about it on hacker news at some point. Anytime it appears on hacker news, it just gets voted to the front page because it's, it's So awesome. And we, we got connected to the maintainer, Steve Chavez. At some point I think he just took an interest in, or we took an interest in Postgres and we kind of got acquainted.And then we found out that, you know, Steve was open to work and this kind of like probably shaped a lot of the way we think about building out supabase as a project and as a company in that we then decided to employ Steve full time, but just to work on PostgREST because it's obviously a huge benefit for us.We're very reliant on it. We want it to succeed because it helps our business. And then as we started to add the other components, we decided that we would then always look for existing tools, existing opensource projects that exist before we decided to build something from scratch. So as we're starting to try and replicate the features of Firebase we would and auth is a great example.We did a full audit of what are all the authorization, authentication, authentication open-source tools that are out there and which one was, if any, would fit best. And we found, and Netlify had built a library called gotrue written in go, which did pretty much exactly what we needed. So we just adopted that.And now obviously, you know, we, we just have a lot of people on the team contributing to, to gotrue as well.[00:11:17] Jeremy: you touched on this a little bit earlier. Normally when you connect to a Postgres database your user has permission to, to basically everything I guess, by default, anyways. And so. So, how does that work? Where when you want to restrict people's permissions, make sure they only get to see records they're allowed to see how has that all configured in PostgREST and what's happening behind the scenes?[00:11:44] Ant: Yeah, we, the great thing about Postgres is it's got this concept of row level security, which actually, I don't think I even rarely looked at until we were building out this auth feature where the security rules live in your database as SQL. So you do like a create policy query, and you say anytime someone tries to select or insert or update apply this policy.And then how it all fits together is our auth server go true. Someone will basically make a request to sign in or sign up with email and password, and we create that user inside the, database. They get issued a URL. And they get issued a JSON, web token, a JWT, and which, you know, when they, when they have it on the, client side, proves that they are this, you, you ID, they have access to this data.Then when they make a request via PostgREST, they send the JWT in the authorization header. Then Postgres will pull out that JWT check the sub claim, which is the UID and compare it to any rows in the database, according to the policy that you wrote. So, so the most basic one is you say in order to, to access this row, it must have a column you UID and it must match whatever is in the JWT.So we basically push the authorization down into the database which actually has, you know, a lot of other benefits in that as you write new clients, You don't need to have, have it live, you know, on an API layer on the client. It's kind of just, everything is managed from the database.[00:13:33] Jeremy: So the, the, you, you ID, you mentioned that represents the user, correct. [00:13:39] Ant: Yeah. [00:13:41] Jeremy: Is that, does that map to a user in post graphs or is there some other way that you're mapping those permissions?[00:13:50] Ant: Yeah. When, so when you connect go true, which is the auth server to your Postgres database for the first time, it installs its own schema. So you'll have an auth schema and inside will be all start users with a list of the users. It'll have a uh, auth dot tokens which will store all the access tokens that it's issued.So, and one of the columns on the auth start user's table will be UUID, and then whenever you write application specific schemers, you can just join a, do a foreign key relation to the author users table. So, so it all gets into schema design and and hopefully we do a good job of having some good education content in the docs as well.Because one of the things we struggled with from the start was how much do we abstract away from SQL away from Postgres and how much do we educate? And we actually landed on the educate sides because I mean, once you start learning about Postgres, it becomes kind of a superpower for you as a developer.So we'd much rather. Have people discover us because we're a firebase alternatives frontend devs then we help them with things like schema design landing about row level security. Because ultimately like every, if you try and abstract that stuff it gets kind of crappy. And maybe not such a great experience. [00:15:20] Jeremy: to make sure I understand correctly. So you have GoTrue, which is uh, a Netlify open-source project that GoTrue project creates some tables in your, your database that has like, you've mentioned the tokens, the, the different users. Somebody makes a request to GoTrue. Like here's my username, my password go true.Gives them back a JWT. And then from your front end, you send that JWT to the PostgREST endpoint. And from that JWT, it's able to know which user you are and then uses postgres' built in a row level security to figure out which rows you're, you're allowed to bring back. Did I, did I get that right?[00:16:07] Ant: That is pretty much exactly how it works. And it's impressive that you garnered that without looking at a single diagram (laughs) But yeah, and, and, and obviously we, we provide a client library supabase JS, which actually does a lot of this work for you. So you don't need to manually attach the JJ JWT in a header.If you've authenticated with supabase JS, then every request sent to PostgREST. After that point, the header will just be attached automatically, and you'll be in a session as that user. [00:16:43] Jeremy: and, and the users that we're talking about when we talk about Postgres' row level security. Are those actual users in PostgreSQL. Like if I was to log in with psql, I could actually log in with those users.[00:17:00] Ant: They're not, you could potentially structure it that way. But it would be more advanced it's it's basically just users in, in the auth.users table, the way, the way it's currently done. [00:17:12] Jeremy: I see and postgrest has the, that row level security is able to work with that table. You, you don't need to have actual Postgres users.[00:17:23] Ant: Exactly. And, and it's, it's basically turing complete. I mean, you can write extremely complex auth policies. You can say, you know, only give access to this particular admin group on a Thursday afternoon between six and 8:00 PM. You can get really, yeah. really as fancy as you want. [00:17:44] Jeremy: Is that all written in SQL or are there other languages they allow you to use?[00:17:50] Ant: Yeah. It's the default is plain SQL. Within Postgres itself, you can useI think you can use, like there's a Python extension. There's a JavaScript extension, which is a, I think it's a subsets of, of JavaScripts. I mean, this is the thing with Postgres, it's super extensible and people have probably got all kinds of interpreters.So you, yeah, you can use whatever you want, but the typical user will just use SQL. [00:18:17] Jeremy: interesting. And that applies to logic in general, I suppose, where if you were writing a rails application, you might write Ruby. Um, If you're writing a node application, you write JavaScript, but you're, you're saying in a lot of cases with PostgREST, you're actually able to do what you want to do, whether that's serialization or mapping objects, do that all through SQL.[00:18:44] Ant: Yeah, exactly, exactly. And then obviously like there's a lot of awesome other stuff that Postgres has like this postGIS, which if you're doing geo, if you've got like a geo application, it'll load it up with a geo types for you, which you can just use. If you're doing like encryption and decryption, we just added PG libsodium, which is a new and awesome cryptography extension.And so you can use all of these, these all add like functions, like SQL functions which you can kind of use in, in any parts of the logic or in the role level policies. Yeah.[00:19:22] Jeremy: and something I thought was a little unique about PostgREST is that I believe it's written in Haskell. Is that right?[00:19:29] Ant: Yeah, exactly. And it makes it fairly inaccessible to me as a result. But the good thing is it's got a thriving community of its own and, you know, people who on there's people who contribute probably because it's written in haskell. And it's, it's just a really awesome project and it's an excuse to, to contribute to it.But yeah. I, I think I did probably the intro course, like many people and beyond that, it's just, yeah, kind of inaccessible to me. [00:19:59] Jeremy: yeah, I suppose that's the trade-off right. Is you have a, a really passionate community about like people who really want to use Haskell and then you've got the, the, I guess the group like yourselves that looks at it and goes, oh, I don't, I don't know about this.[00:20:13] Ant: I would, I would love to have the time to, to invest in uh, but not practical right now. [00:20:21] Jeremy: You talked a little bit about the GoTrue project from Netlify. I think I saw on one of your blog posts that you actually forked it. Can you sort of explain the reasoning behind doing that?[00:20:34] Ant: Yeah, initially it was because we were trying to move extremely fast. So, so we did Y Combinator in 2020. And when you do Y Combinator, you get like a part, a group partner, they call it one of the, the partners from YC and they add a huge amount of external pressure to move very quickly. And, and our biggest feature that we are working on in that period was auth.And we just kept getting the question of like, when are you going to ship auth? You know, and every single week we'd be like, we're working on it, we're working on it. And um, and one of the ways we could do it was we just had to iterate extremely quickly and we didn't rarely have the time to, to upstream things correctly.And actually like the way we use it in our stack is slightly differently. They connected to MySQL, we connected to Postgres. So we had to make some structural changes to do that. And the dream would be now that we, we spend some time upstream and a lot of the changes. And hopefully we do get around to that.But the, yeah, the pace at which we've had to move over the last uh, year and a half has been kind of scary and, and that's the main reason, but you know, hopefully now we're a little bit more established. We can hire some more people to, to just focus on, go true and, and bringing the two folks back together. [00:22:01] Jeremy: it's just a matter of, like you said speed, I suppose, because the PostgREST you, you chose to continue working off of the existing open source project, right? [00:22:15] Ant: Yeah, exactly. Exactly. And I think the other thing is it's not a major part of Netlify's business, as I understand it. I think if it was and if both companies had more resource behind it, it would make sense to obviously focus on on the single codebase but I think both companies don't contribute as much resource as as we would like to, but um, but it's, it's for me, it's, it's one of my favorite parts of the stack to work on because it's written in go and I kind of enjoy how that it all fits together.So Yeah. I, I like to dive in there. [00:22:55] Jeremy: w w what about go, or what about how it's structured? Do you particularly enjoy about the, that part of the project?[00:23:02] Ant: I think it's so I actually learned learned go through, gotrue and I'm, I have like a Python and C plus plus background And I hate the fact that I don't get to use Python and C plus posts rarely in my day to day job. It's obviously a lot of type script. And then when we inherited this code base, it was kind of, as I was picking it up I, it just reminded me a lot of, you know, a lot of the things I loved about Python and C plus plus, and, and the tooling around it as well. I just found to be exceptional. So, you know, you just do like a small amounts of conflig. Uh config, And it makes it very difficult to, to write bad code, if that makes sense.So the compiler will just, boot you back if you try and do something silly which isn't necessarily the case with, with JavaScript. I think TypeScript is a little bit better now, but Yeah, I just, it just reminded me a lot of my Python and C days.[00:24:01] Jeremy: Yeah, I'm not too familiar with go, but my understanding is that there's, there's a formatter that's a part of the language, so there's kind of a consistency there. And then the language itself tries to get people to, to build things in the same way, or maybe have simpler ways of building things. Um, I don't, I don't know.Maybe that's part of the appeal.[00:24:25] Ant: Yeah, exactly. And the package manager as well is great. It just does a lot of the importing automatically. and makes sure like all the declarations at the top are formatted correctly and, and are definitely there. So Yeah. just all of that tool chain is just really easy to pick up.[00:24:46] Jeremy: Yeah. And I, and I think compiled languages as well, when you have the static type checking. By the compiler, you know, not having things blow up and run time. That's, that's just such a big relief, at least for me in a lot of cases,[00:25:00] Ant: And I just loved the dopamine hits of when you compile something on it actually compiles this. I lose that with, with working with JavaScript. [00:25:11] Jeremy: for sure. One of the topics you mentioned earlier was how super base provides real-time database updates. And which is something that as far as I know is not natively a part of Postgres. So I wonder if you could explain a little bit about how that works and how that came about.[00:25:31] Ant: Yeah. So, So Postgres, when you add replication databases the way it does is it writes everything to this thing called the write ahead log, which is basically all the changes that uh, have, are going to be applied to, to the database. And when you connect to like a replication database. It basically streams that log across.And that's how the replica knows what, what changes to, to add. So we wrote a server, which basically pretends to be a Postgres rep, replica receives the right ahead log encodes it into JSON. And then you can subscribe to that server over web sockets. And so you can choose whether to subscribe, to changes on a particular schema or a particular table or particular columns, and even do equality matches on rows and things like this.And then we recently added the role level security policies to the real-time stream as well. So that was something that took us a while to, cause it was probably one of the largest technical challenges we've faced. But now that it's in the real-time stream is, is fully secure and you can apply these, these same policies that you apply over the CRUD API as well.[00:26:48] Jeremy: So for that part, did you have to look into the internals of Postgres and how it did its row level security and try to duplicate that in your own code?[00:26:59] Ant: Yeah, pretty much. I mean it's yeah, it's fairly complex and there's a guy on our team who, well, for him, it didn't seem as complex, let's say (laughs) , but yeah, that's pretty much it it's just a lot of it's effectively a SQL um, a Postgres extension itself, uh which in-in interprets those policies and applies them to, to the, to the, the right ahead log.[00:27:26] Jeremy: and this piece that you wrote, that's listening to the right ahead log. what was it written in and, and how did you choose that, that language or that stack?[00:27:36] Ant: Yeah. That's written in the Elixir framework which is based on Erlang very horizontally scalable. So any applications that you write in Elixir can kind of just scale horizontally the message passing and, you know, go into the billions and it's no problem. So it just seemed like a sensible choice for this type of application where you don't know.How large the wall is going to be. So it could just be like a few changes per second. It could be a million changes per second, then you need to be able to scale out. And I think Paul who's my co-founder originally, he wrote the first version of it and I think he wrote it as an excuse to learn Elixir, which is how, a lot of probably how PostgREST ended up being Haskell, I imagine.But uh, but it's meant that the Elixir community is still like relatively small. But it's a group of like very passionate and very um, highly skilled developers. So when we hire from that pool everyone who comes on board is just like, yeah, just, just really good and really enjoy is working with Elixir.So it's been a good source of a good source for hires as well. Just, just using those tools. [00:28:53] Jeremy: with a feature like this, I'm assuming it's where somebody goes to their website. They make a web socket connection to your application and they receive the updates that way. How have you seen how far you're able to push that in terms of connections, in terms of throughput, things like that?[00:29:12] Ant: Yeah, I don't actually have the numbers at hand. But we have, yeah, we have a team focused on obviously maximizing that but yeah, I don't I don't don't have those numbers right now. [00:29:24] Jeremy: one of the last things you've you've got on your website is a storage project or a storage product, I should say. And I believe it's written in TypeScript, so I was curious, we've got PostGrest, which is in Haskell. We've got go true and go. Uh, We've got the real-time database part in elixir.And so with storage, how did we finally get to TypeScript?[00:29:50] Ant: (Laughs) Well, the policy we kind of landed on was best tool for the job. Again, the good thing about being an open source is we're not resource constrained by the number of people who are in our team. It's by the number of people who are in the community and I'm willing to contribute. And so for that, I think one of the guys just went through a few different options that we could have went with, go just to keep it in line with a couple of the other APIs.But we just decided, you know, a lot of people well, everyone in the team like TypeScript is kind of just a given. And, and again, it was kind of down to speed, like what's the fastest uh we can get this up and running. And I think if we use TypeScript, it was, it was the best solution there. But yeah, but we just always go with whatever is best.Um, We don't worry too much uh, about, you know, the resources we have because the open source community has just been so great in helping us build supabase. And building supabase is like building like five companies at the same time actually, because each of these vertical stacks could be its own startup, like the auth stack And the storage layer, and all of this stuff.And you know, each has, it does have its own dedicated team. So yeah. So we're not too worried about the variation in languages.[00:31:13] Jeremy: And the storage layer is this basically a wrapper around S3 or like what is that product doing?[00:31:21] Ant: Yeah, exactly. It's it's wraparound as three. It, it would also work with all of the S3 compatible storage systems. There's a few Backblaze and a few others. So if you wanted to self host and use one of those alternatives, you could, we just have everything in our own S3 booklets inside of AWS.And then the other awesome thing about the storage system is that because we store the metadata inside of Postgres. So basically the object tree of what buckets and folders and files are there. You can write your role level policies against the object tree. So you can say this, this user should only access this folder and it's, and it's children which was kind of. Kind of an accident. We just landed on that. But it's one of my favorite things now about writing applications and supervisors is the rollover policies kind of work everywhere.[00:32:21] Jeremy: Yeah, it's interesting. It sounds like everything. Whether it's the storage or the authentication it's all comes back to postgres, right? At all. It's using the row level security. It's using everything that you put into the tables there, and everything's just kind of digging into that to get what it needs.[00:32:42] Ant: Yeah. And that's why I say we are a database company. We are a Postgres company. We're all in on postgres. We got asked in the early days. Oh, well, would you also make it my SQL compatible compatible with something else? And, but the amounts. Features Postgres has, if we just like continue to leverage them then it, it just makes the stack way more powerful than if we try to you know, go thin across multiple different databases.[00:33:16] Jeremy: And so that, that kind of brings me to, you mentioned how your Postgres companies, so when somebody signs up for supabase they create their first instance. What's what's happening behind the scenes. Are you creating a Postgres instance for them in a container, for example, how do you size it? That sort of thing.[00:33:37] Ant: Yeah. So it's basically just easy to under the hood for us we, we have plans eventually to be multi-cloud. But again, going down to the speed of execution that the. The fastest way was to just spin up a dedicated instance, a dedicated Postgres instance per user on EC2. We do also package all of the API APIs together in a second EC2 instance.But we're starting to break those out into clustered services. So for example, you know, not every user will use the storage API, so it doesn't make sense to Rooney for every user regardless. So we've, we've made that multitenant, the application code, and now we just run a huge global cluster which people connect through to access the S3 bucket.Basically and we're gonna, we have plans to do that for the other services as well. So right now it's you got two EC2 instances. But over time it will be just the Postgres instance and, and we wanted. Give everyone a dedicated instance, because there's nothing worse than sharing database resource with all the users, especially when you don't know how heavily they're going to use it, whether they're going to be bursty.So I think one of the things we just said from the start is everyone gets a Postgres instance and you get access to it as well. You can use your Postgres connection string to, to log in from the command line and kind of do whatever you want. It's yours.[00:35:12] Jeremy: so did it, did I get it right? That when I sign up, I create a super base account. You're actually creating an two instance for me specifically. So it's like every customer gets their, their own isolated it's their own CPU, their own Ram, that sort of thing.[00:35:29] Ant: Yeah, exactly, exactly. And, and the way the. We've set up the monitoring as well, is that we can expose basically all of that to you in the dashboard as well. so you can, you have some control over like the resource you want to use. If you want to a more powerful instance, we can do that. A lot of that stuff is automated.So if someone scales beyond the allocated disk size, the disk will automatically scale up by 50% each time. And we're working on automating a bunch of these, these other things as well.[00:36:03] Jeremy: so is it, is it where, when you first create the account, you might create, for example, a micro instance, and then you have internal monitoring tools that see, oh, the CPU is getting heady hit pretty hard. So we need to migrate this person to a bigger instance, that kind of thing.[00:36:22] Ant: Yeah, pretty much exactly. [00:36:25] Jeremy: And is that, is that something that the user would even see or is it the case of where you send them an email and go like, Hey, we notice you're hitting the limits here. Here's what's going to happen. [00:36:37] Ant: Yeah.In, in most cases it's handled automatically. There are people who come in and from day one, they say has my requirements. I'm going to have this much traffic. And I'm going to have, you know, a hundred thousand users hitting this every hour. And in those cases we will over-provisioned from the start.But if it's just the self service case, then it will be start on a smaller instance and an upgrade over time. And this is one of our biggest challenges over the next five years is we want to move to a more scalable Postgres. So cloud native Postgres. But the cool thing about this is there's a lot of.Different companies and individuals working on this and upstreaming into Postgres itself. So for us, we don't need to, and we, and we would never want to fork Postgres and, you know, and try and separate the storage and the the computes. But more we're gonna fund people who are already working on this so that it gets upstreamed into Postgres itself.And it's more cloud native. [00:37:46] Jeremy: Yeah. So I think the, like we talked a little bit about how Firebase was the original inspiration and when you work with Firebase, you, you don't think about an instance at all, right? You, you just put data in, you get data out. And it sounds like in this case, you're, you're kind of working from the standpoint of, we're going to give you this single Postgres instance.As you hit the limits, we'll give you a bigger one. But at some point you, you will hit a limit of where just that one instance is not enough. And I wonder if there's you have any plans for that, or if you're doing anything currently to, to handle that.[00:38:28] Ant: Yeah. So, so the medium goal is to do replication like horizontal scaling. We, we do that for some users already but we manually set that up. we do want to bring that to the self serve model as well, where you can just choose from the start. So I want, you know, replicas in these, in these zones and in these different data centers.But then, like I said, the long-term goal is that. it's not based on. Horizontally scaling a number of instances it's just a Postgres itself can, can scale out. And I think we will get to, I think, honestly, the race at which the Postgres community is working, I think we'll be there in two years.And, and if we can contribute resource towards that, that goal, I think yeah, like we'd love to do that, but yeah, but for now, it's, we're working on this intermediate solution of, of what people already do with, Postgres, which is, you know, have you replicas to make it highly available.[00:39:30] Jeremy: And with, with that, I, I suppose at least in the short term, the goal is that your monitoring software and your team is handling the scaling up the instance or creating the read replicas. So to the user, it, for the most part feels like a managed service. And then yeah, the next step would be to, to get something more similar to maybe Amazon's Aurora, I suppose, where it just kind of, you pay per use.[00:40:01] Ant: Yeah, exactly. Exactly. Aurora was kind of the goal from the start. It's just a shame that it's proprietary. Obviously. [00:40:08] Jeremy: right. Um, but it sounds, [00:40:10] Ant: the world would be a better place. If aurora was opensource. [00:40:15] Jeremy: yeah. And it sounds like you said, there's people in the open source community that are, that are trying to get there. just it'll take time. to, to all this, about making it feel seamless, making it feel like a serverless experience, even though internally, it really isn't, I'm guessing you must have a fair amount of monitoring or ways that you're making these decisions.I wonder if you can talk a little bit about, you know, what are the metrics you're looking at and what are the applications you're you have to, to help you make these decisions?[00:40:48] Ant: Yeah. definitely. So we started with Prometheus which is a, you know, metrics gathering tool. And then we moved to Victoria metrics which was just easier for us to scale out. I think soon we'll be managing like a hundred thousand Postgres databases will have been deployed on, on supabase. So definitely, definitely some scale. So this kind of tooling needs to scale to that as well. And then we have agents kind of everywhere on each application on, on the database itself. And we listen for things like the CPU and the Ram and the network IO. We also poll. Uh, Postgres itself. Th there's a extension called PG stats statements, which will give us information about what are, the intensive queries that are running on that, on that box.So we just collect as much of this as possible um, which we then obviously use internally. We set alerts to, to know when, when we need to upgrade in a certain direction, but we also have an end point where the dashboard subscribes to these metrics as well. So the user themselves can see a lot of this information.And we, I think at the moment we do a lot of the, the Ram the CPU, that kind of stuff, but we're working on adding just more and more of these observability metrics uh, so people can can know it could, because it also helps with Let's say you might be lacking an index on a particular table and not know about it.And so if we can expose that to you and give you alerts about that kind of thing, then it obviously helps with the developer experience as well.[00:42:29] Jeremy: Yeah. And th that brings me to something that I, I hear from platform as a service companies, where if a user has a problem, whether that's a crash or a performance problem, sometimes it can be difficult to distinguish between is it a problem in their application or is this a problem in super base or, you know, and I wonder how your support team kind of approaches that.[00:42:52] Ant: Yeah, no, it's, it's, it's a great question. And it's definitely something we, we deal with every day, I think because of where we're at as a company we've always seen, like, we actually have a huge advantage in that.we can provide. Rarely good support. So anytime an engineer joins super base, we tell them your primary job is actually frontline support.Everything you do afterwards is, is secondary. And so everyone does a four hour shift per week of, of working directly with the customers to help determine this kind of thing. And where we are at the moment is we are happy to dive in and help people with their application code because it helps our engineers land about how it's being used and where the pitfalls are, where we need better documentation, where we need education.So it's, that is all part of the product at the moment, actually. And, and like I said, because we're not a 10,000 person company we, it's an advantage that we have, that we can deliver that level of support at the moment. [00:44:01] Jeremy: w w what are some of the most common things you see happening? Like, is it I would expect you mentioned indexing problems, but I'm wondering if there's any specific things that just come up again and again,[00:44:15] Ant: I think like the most common is people not batching their requests. So they'll write an application, which, you know, needs to, needs to pull 10,000 rows and they send 10,000 requests (laughs) . That that's, that's a typical one for, for people just getting started maybe. Yeah. and, then I think the other thing we faced in the early days was. People storing blobs in the database which we obviously solve that problem by introducing file storage. But people will be trying to store, you know, 50 megabytes, a hundred megabyte files in Postgres itself, and then asking why the performance was so bad.So I think we've, we've mitigated that one by, by introducing the blob storage.[00:45:03] Jeremy: and when you're, you mentioned you have. Over a hundred thousand instances running. I imagine there have to be cases where an incident occurs, where something doesn't go quite right. And I wonder if you could give an example of one and how it was resolved.[00:45:24] Ant: Yeah, it's a good question. I think, yeah, w w we've improved the systems since then, but there was a period where our real time server wasn't able to handle rarely large uh, right ahead logs. So w there was a period where people would just make tons and tons of requests and updates to, to Postgres. And the real time subscriptions were failing. But like I said, we have some really great Elixir devs on the team, so they were able to jump on that fairly quickly. And now, you know, the application is, is way more scalable as a result. And that's just kind of how the support model works is you have a period where everything is breaking and then uh, then you can just, you know, tackle these things one by one. [00:46:15] Jeremy: Yeah, I think any, anybody at a, an early startup is going to run into that. Right? You put it out there and then you find out what's broken, you fix it and you just get better and better as it goes along.[00:46:28] Ant: Yeah, And the funny thing was this model of, of deploying EC2 instances. We had that in like the first week of starting super base, just me and Paul. And it was never intended to be the final solution. We just kind of did it quickly and to get something up and running for our first handful of users But it's scaled surprisingly well.And actually the things that broke as we started to get a lot of traffic and a lot of attention where was just silly things. Like we give everyone their own domain when they start a new project. So you'll have project ref dot super base dot in or co. And the things that were breaking where like, you know, we'd ran out of sub-domains with our DNS provider and then, but, and those things always happen in periods of like intense traffic.So we ha we were on the front page of hacker news, or we had a tech crunch article, and then you discover that you've ran out of sub domains and the last thousand people couldn't deploy their projects. So that's always a fun a fun challenge because you are then dependent on the external providers as well and theirs and their support systems.So yeah, I think. We did a surprisingly good job of, of putting in good infrastructure from the start. But yeah, all of these crazy things just break when obviously when you get a lot of, a lot of traffic[00:48:00] Jeremy: Yeah, I find it interesting that you mentioned how you started with creating the EC2 instances and it turned out that just work. I wonder if you could walk me through a little bit about how it worked in the beginning, like, was it the two of you going in and creating instances as people signed up and then how it went from there to where it is today?[00:48:20] Ant: yeah. So there's a good story about, about our fast user, actually. So me and Paul used to contract for a company in Singapore, which was an NFT company. And so we knew the lead developer very well. And we also still had the Postgres credentials on, on our own machines. And so what we did was we set up the th th the other funny thing is when we first started, we didn't intend to host the database.We, we thought we were just gonna host the applications that would connect to your existing Postgres instance. And so what we did was we hooked up the applications to, to the, to the Postgres instance of this, of this startup that we knew very well. And then we took the bus to their office and we sat with the lead developer, and we said, look, we've already set this thing up for you.What do you think. know, when, when you think like, ah, we've, we've got the best thing ever, but it's not until you put it in front of someone and you see them, you know, contemplating it and you're like, oh, maybe, maybe it's not so good. Maybe we don't have anything. And we had that moment of panic of like, oh, maybe we just don't maybe this isn't great.And then what happened was he didn't like use us. He didn't become a supabase user. He asked to join the team. [00:49:45] Jeremy: nice, nice.[00:49:46] Ant: that was a good a good kind of a moment where we thought, okay, maybe we have got something, maybe this is maybe this isn't terrible. So, so yeah, so he became our first employee. Yeah. [00:49:59] Jeremy: And so yeah, so, so that case was, you know, the very beginning you set everything up from, from scratch. Now that you have people signing up and you have, you know, I don't know how many signups you get a day. Did you write custom infrastructure or applications to do the provisioning or is there an open source project that you're using to handle that[00:50:21] Ant: Yeah. It's, it's actually mostly custom. And you know, AWS does a lot of the heavy lifting for you. They just provide you with a bunch of API end points. So a lot of that is just written in TypeScript fairly straightforward and, and like I said, you never intended to be the thing that last. Two years into the business.But it's, it's just scaled surprisingly well. And I'm sure at some point we'll, we'll swap it out for some I don't orchestration tooling like Pulumi or something like this. But actually the, what we've got just works really well.[00:50:59] Ant: Be because we're so into Postgres our queuing system is a Postgres extension called PG boss. And then we have a fleet of workers, which are. Uh, We manage on EC ECS. Um, So it's just a bunch of VMs basically which just subscribed to the, to the queue, which lives inside the database.And just performs all the, whether it be a project creation, deletion modification a whole, whole suite of these things. Yeah. [00:51:29] Jeremy: very cool. And so even your provisioning is, is based on Postgres.[00:51:33] Ant: Yeah, exactly. Exactly (laughs) . [00:51:36] Jeremy: I guess in that case, I think, did you say you're using the right ahead log there to in order to get notifications?[00:51:44] Ant: We do use real time, and this is the fun thing about building supabase is we use supabase to build supabase. And a lot of the features start with things that we build for ourselves. So the, the observability features we have a huge logging division. So, so w we were very early users of a tool called a log flare, which is also written in Elixir.It's basically a log sync backed up by BigQuery. And we loved it so much and we became like super log flare power users that it was kind of, we decided to eventually acquire the company. And now we can just offer log flare to all of our customers as well as part of using supabase. So you can query your logs and get really good business intelligence on what your users um, consuming in from your database.[00:52:35] Jeremy: the lock flare you're mentioning though, you said that that's a log sink and that that's actually not going to Postgres, right. That's going to a different type of store.[00:52:43] Ant: Yeah. That is going to big query actually. [00:52:46] Jeremy: Oh, big query. Okay. [00:52:47] Ant: yeah, and maybe eventually, and this is the cool thing about watching the Postgres progression is it's become. It's bringing like transactional and analytical databases together. So it's traditionally been a great transactional database, but if you look at a lot of the changes that have been made in recent versions, it's becoming closer and closer to an analytical database.So maybe at some point we will use it, but yeah, but big query works just great. [00:53:18] Jeremy: Yeah. It's, it's interesting to see, like, I, I know that we've had episodes on different extensions to Postgres where I believe they change out how the storage works. So there's yeah, it's really interesting how it's it's this one database, but it seems like it can take so many different forms. [00:53:36] Ant: It's just so extensible and that's why we're so bullish on it because okay. Maybe it wasn't always the best database, but now it seems like it is becoming the best database and the rate at which it's moving. It's like, where's it going to be in five years? And we're just, yeah, we're just very bullish on, on Postgres.As you can tell from the amount of mentions it's had in this episode.[00:54:01] Jeremy: yeah, we'll have to count how many times it's been said. I'm sure. It's, I'm sure it's up there. Is there anything else we, we missed or think you should have mentioned.[00:54:12] Ant: No, some of the things we're excited about are cloud functions. So it's the thing we just get asked for the most at anytime we post anything on Twitter, you're guaranteed to get a reply, which is like when functions. And we're very pleased to say that it's, it's almost there. So um, that will hopefully be a really good developer experience where also we launched like a, a graph QL Postgres extension where the resolver lives inside of Postgres.And that's still in early alpha, but I think I'm quite excited for when we can start offering that on the on the hosted platform as well. People will have that option to, to use GraphQL instead of, or as well as the restful API.[00:55:02] Jeremy: the, the common thread here is that PostgreSQL you're able to take it really, really far. Right. In terms of scale up, eventually you'll have the read replicas. Hopefully you'll have. Some kind of I don't know what you would call Aurora, but it's, it's almost like self provisioning, maybe not sharing what, how you describe it.But I wonder as a, as a company, like we talked about big query, right? I wonder if there's any use cases that you've come across, either from customers or in your own work where you're like, I just, I just can't get it to fit into Postgres.[00:55:38] Ant: I think like, not very often, but sometimes we'll, we will respond to support requests and recommend that people use Firebase. they're rarelylike if, if they really do have like large amounts of unstructured data, which is which, you know, documented storage is, is kind of perfect for that. We'll just say, you know, maybe you should just use Firebase.So we definitely come across things like that. And, and like I said, we love, we love Firebase, so we're definitely not trying to, to uh, destroy as a tool. I think it, it has its use cases where it's an incredible tool yeah. And provides a lot of inspiration for, for what we're building as well. [00:56:28] Jeremy: all right. Well, I think that's a good place to, to wrap it up, but where can people hear more about you hear more about supabase?[00:56:38] Ant: Yeah, so supeabase is at supabase.com. I'm on Twitter at ant Wilson. Supabase is on Twitter at super base. Just hits us up. We're quite active on the and then definitely check out the repose gets up.com/super base. There's lots of great stuff to dig into as we discussed. There's a lot of different languages, so kind of whatever you're into, you'll probably find something where you can contribute. [00:57:04] Jeremy: Yeah, and we, we sorta touched on this, but I think everything we've talked about with the exception of the provisioning part and the monitoring part is all open source. Is that correct? [00:57:16] Ant: Yeah, exactly.And as, yeah. And hopefully everything we build moving forward, including functions and graph QL we'll continue to be open source.[00:57:31] Jeremy: And then I suppose the one thing I, I did mean to touch on is what, what is the, the license for all the components you're using that are open source?[00:57:41] Ant: It's mostly Apache2 or MIT. And then obviously Postgres has its own Postgres license. So as long as it's, it's one of those, then we, we're not too precious. I, As I said, we inherit a fair amounts of projects. So we contribute to and adopt projects. So as long as it's just very permissive, then we don't care too much.[00:58:05] Jeremy: As far as the projects that your team has worked on, I've noticed that over the years, we've seen a lot of companies move to things like the business source license or there's, there's all these different licenses that are not quite so permissive. And I wonder like what your thoughts are on that for the future of your company and why you think that you'll be able to stay permissive.[00:58:32] Ant: Yeah, I really, really, rarely hope that we can stay permissive. forever. It's, it's a philosophical thing for, for us. You know, when we, we started the business, it's what just very, just very, as individuals into the idea of open source. And you know, if, if, if AWS come along at some point and offer hosted supabase on AWS, then it will be a signal that where we're doing something.Right. And at that point we just, I think we just need to be. The best team to continue to move super boost forward. And if we are that, and I, I think we will be there and then hopefully we will never have to tackle this this licensing issue. [00:59:19] Jeremy: All right. Well, I wish you, I wish you luck.[00:59:23] Ant: Thanks. Thanks for having me. [00:59:25] Jeremy: This has been Jeremy Jung for software engineering radio. Thanks for listening.
Robert-Jan Huijsman is zo'n nerd die je bijna elk seizoen van onze podcast een keer tegenkomt. De eerste keer dat Robert-Jan aan tafel schoof was hij programmeur bij Google. Later startte hij zijn eigen lenzenscannerstartup om vervolgens even rust te pakken. Hij is fanatiek acapella-zanger en opnieuw begonnen aan een start-up avontuur.Dit keer behelst zijn start-up het schrijven van gedistribueerde software die zo makkelijk is dat iedere data-analist er mee moet kunnen werken *kuch* net als SQL *kuch*, in plaats van dat grote bedrijven er hele teams voor in dienst hebben. Dat doet hij met een team indrukwekkende namen die hij zelf naar eigen inzicht wel of niet mag onthullen in de podcast. Hun uitdaging zóu kunnen beteken dat ze een nieuwe programmeertaal gaan schrijven. En daar weten wij natuurlijk precies NIETS van.Maar gelukkig hebben wij Felienne Hermans bereid gevonden ons inhoudelijk bij te staan. Ook zij was eerder te horen, in MNOT S06E16. Ze is universitair hoofddocent bij het Leiden Institute of Advanced Computer Science, vaste waarde in Software Engineering Radio, en denkt als lid van de TC39-commissie mee over de toekomst van Javascript. Ze is maker van een massive open online course over Scratch programmeren voor kinderen en fanatiek marathonloper.ReclameDeze aflevering wordt mede mogelijk gemaakt door Bit Academy. Wil jij ook leren programmeren? Ga dan snel naar Bit-Academy.nl en klik op ‘NL Leert Door' om je aan te melden.Tijdschema00:00:00 Waar is mijn bestelgeschiedenis?00:04:09 Voorstellen: Robert-Jan Huijsman00:05:56 Een nieuwe gedistribueerde startup00:22:08 Reboot.dev00:39:08 Is dit het wiel opnieuw uitvinden?00:42:33 Wat is er anders aan dan Docker?00:45:21 Wie zijn de klanten?01:04:38 Wat is het buisnessmodel?01:15:08 Hoe gaat Robert-Jan dat bouwen?01:19:59 Reclame Bit-Academy01:22:17 Vragen van de luisteraars01:33:55 Tips01:49:05 AfkondigingTipsRandal PeelenMilky Road Brewery road tripFloris DiemelAls je een 4K TV koopt, weest zeker dat je ook 4K content kunt afspelen. Ik wil mijn TV niet aan het grote boze internet, dus moet een externe player. En toen bleek niet de player, maar het netwerk de weakest link en momenteel is de server weer de boosdoener. GEK WORDT JE ERVAN. Aldus: regel dit voordat je naar 4K of 8K wenst over te stappen ;)Robert-Jan HuijsmanSONY WF1000XM4Internote A CapellaFelienne HermansDingen uit je hoofd leren? Flitskaarten eens checken! Heel eenvoudig en supercheap (kost je wat inkt en papier). Werkt echt geniaal!CodeReadingClubOver leren programmeren: meer lezen dan schrijven. Begrijpen helpt enorm om te kunnen gaan coden immers. Denk eens aan het doorlezen van code van een collega, bespreek gewoon wat het doel is en hoe erheen is gewerkt. Lezen is natuurlijk ook fijn, want je hoeft niet te compilen en te runnen!The Programmer's Brain (boek)Zie het privacybeleid op https://art19.com/privacy en de privacyverklaring van Californië op https://art19.com/privacy#do-not-sell-my-info.
Alexander Pugh is a software engineer at Albertsons. He has worked in Robotic Process Automation and the cognitive services industry for over five years.This episode originally aired on Software Engineering Radio.Related LinksAlexander Pugh's personal siteEnterprise RPA Solutions Automation Anywhere UiPath blueprism Enterprise "Low Code/No Code" API Solutions appian mulesoft Power Automate RPA and the OS Office primary interop assemblies Office Add-ins documentation Task Scheduler for developers The Component Object Model The Document Object Model TranscriptYou can help edit this transcript on GitHub.[00:00:00] Jeremy: Today, I'm talking to Alexander Pugh. He's a solutions architect with over five years of experience working on robotic process automation and cognitive services. Today, we're going to focus on robotic process automation. Alexander welcome to software engineering radio. [00:00:17] Alex: Thank you, Jeremy. It's really good to be here. [00:00:18] Jeremy: So what does robotic process automation actually mean? [00:00:23] Alex: Right. It's a, it's a very broad nebulous term. when we talk about robotic process automation, as a concept, we're talking about automating things that humans do in the way that they do them. So that's the robotic, an automation that is, um, done in the way a human does a thing.Um, and then process is that thing, um, that we're automating. And then automation is just saying, we're turning this into an automation where we're orchestrating this and automating this. and the best way to think about that in any other way is to think of a factory or a car assembly line. So initially when we went in and we, automated a car or factory, automation line, what they did is essentially they replicated the process as a human did it. So one day you had a human that would pick up a door and then put it on the car and bolt it on with their arms. And so the initial automations that we had on those factory lines were a robot arm that would pick up that door from the same place and put it on the car and bolt it on there.Um, so the same can be said for robotic process automation. We're essentially looking at these, processes that humans do, and we're replicating them, with an automation that does it in the same way. Um, and where we're doing that is the operating system. So robotic process automation is essentially going in and automating the operating system to perform tasks the same way a human would do them in an operating system.So that's, that's RPA in a nutshell, Jeremy: So when you say you're replicating something that a human would do, does it mean it has to go through some kind of GUI or some kind of user interface?[00:02:23] Alex: That's exactly right, actually. when we're talking about RPA and we look at a process that we want to automate with RPA, we say, okay. let's watch the human do it. Let's record that. Let's document the human process. And then let's use the RPA tool to replicate that exactly in that way.So go double click on Chrome, launch that click in the URL line and send key in www.cnn.com or what have you, or servicenow hit enter, wait for it to load and then click, you know, where you want to, you know, fill out your ticket for service. Now send key in. So that's exactly how an RPA solution at the most basic can be achieved.Now and any software engineer knows if you sit there and look over someone's shoulder and watch them use an operating system. Uh, you'll say, well, there's a lot of ways we can do this more efficiently without going over here, clicking that, you know, we can, use a lot of services that the operating system provides in a programmatic way to achieve the same ends and RPA solutions can also do that.The real key is making sure that it is still achieving something that the human does and that if the RPA solution goes away, a human can still achieve it. So if you're, trying to replace or replicate a process with RPA, you don't want to change that process so much so that a human can no longer achieve it as well.that's something where if you get a very technical, and very fluent software engineer, they lose sight of that because they say, oh, you know what? There's no reason why we need to go open a browser and go to you know, the service now portal and type this in when I can just directly send information to their backend.which a human could not replicate. Right? So that's kind of where the line gets fuzzy. How efficiently can we make this RPA solution? [00:04:32] Jeremy: I, I think a question that a lot of people are probably having is a lot of applications have APIs now. but what you're saying is that for it to, to be, I suppose, true RPA, it needs to be something that a user can do on their own and not something that the user can do by opening up dev tools or making a post to an end point.[00:04:57] Alex: Yeah. And so this, this is probably really important right now to talk about why RPA, right? Why would you do this when you could put on a server, a a really good, API ingestion point or trigger or a web hook that can do this stuff. So why would we, why would we ever pursue our RPA?There there's a lot of good reasons for it. RPA is very, very enticing to the business. RPA solutions and tools are marketed as a low code, no code solution for the business to utilize, to solve their processes that may not be solved by an enterprise solution and the in-between processes in a way.You have, uh, a big enterprise, finance solution that everyone uses for the finance needs of your business, but there are some things that it doesn't provide for that you have a person that's doing a lot of, and the business says, Okay. well, this thing, this human is doing this is really beneath their capability. We need to get a software solution for it, but our enterprise solution just can't account for it. So let's get a RPA capability in here. We can build it ourselves, and then there we go. So there, there are many reasons to do that. financial, IT might not have, um, the capability or the funding to actually build and solve the solution. Or it it's at a scale that is too small to open up, uh, an IT project to solve for. Um, so, you know, a team of five is just doing this and they're doing it for, you know, 20 hours a week, which is uh large, but in a big enterprise, that's not really. Maybe um, worth building an enterprise solution for it. or, and this is a big one. There are regulatory constraints and security constraints around being able to access this or communicate some data or information in a way that is non-human or programmatic. So that's really where, um, RPA is correctly and best applied and you'll see it most often.So what we're talking about there is in finance, in healthcare or in big companies where they're dealing with a lot of user data or customer data in a way. So when we talk about finance and healthcare, there are a lot of regulatory constraints and security reasons why you would not enable a programmatic solution to operate on your systems. You know, it's just too hard. We we're not going to expose our databases or our data to any other thing. It would, it would take a huge enterprise project to build out that capability, secure that capability and ensure it's going correctly. We just don't have the money the time or the strength honestly, to afford for it.So they say, well, we already have. a user pattern. We already allow users to, to talk to this information and communicate this information. Let's get an RPA tool, which for all intents and purposes will be acting as a user. And then it can just automate that process without us exposing to queries or any other thing, an enterprise solution or programmatic, um, solution.So that's really why RPA, where and why you, you would apply it is there's, there's just no capability at enterprise for one reason or another to solve for it. [00:08:47] Jeremy: as software engineers, when we see this kind of problem, our first thought is, okay, let's, let's build this custom application or workflow. That's going to talk to all these API APIs. And, and what it sounds like is. In a lot of cases there just isn't the time there just isn't the money, to put in the effort to do that.And, it also sounds like this is a way of being able to automate that. and maybe introducing less risk because you're going through the same, security, the same workflow that people are doing currently. So, you know, you're not going to get into things that they're not supposed to be able to get into because all of that's already put in place.[00:09:36] Alex: Correct. And it's an already accepted pattern and it's kind of odd to apply that kind of very IT software engineer term to a human user, but a human user is a pattern in software engineering. We have patterns that do this and that, and, you know, databases and not, and then the user journey or the user permissions and security and all that is a pattern.And that is accepted by default when you're building these enterprise applications okay.What's the user pattern. And so since that's already established and well-known, and all the hopefully, you know, walls are built around that to enable it to correctly do what it needs to do. It's saying, Okay. we've already established that. Let's just use that instead of. You know, building a programmatic solution where we have to go and find, do we already have an appropriate pattern to apply to it? Can we build it in safe way? And then can we support it? You know, all of a sudden we, you know, we have the support teams that, you know, watch our Splunk dashboards and make sure nothing's going down with our big enterprise application.And then you're going to build a, another capability. Okay. WHere's that support going to come from? And now we got to talk about change access boards, user acceptance testing and, uh, you know, UAT dev production environments and all that. So it becomes, untenable, depending on your, your organization to, to do that for things that might fall into a place that is, it doesn't justify the scale that needs to be thrown upon it.But when we talk about something like APIs and API exist, um, for a lot of things, they don't exist for everything. And, a lot of times that's for legacy databases, that's for mainframe capability. And this is really where RPA shines and is correctly applied. And especially in big businesses are highly regulated businesses where they can't upgrade to the newest thing, or they can't throw something to the cloud.They have a, you know, their mainframe systems or they have their database systems that have to exist for one reason or the other until there is the motivation and the money and the time to correctly migrate and, and solve for them. So until that day, and again, there's no, API to, to do anything on a, on a mainframe, in this bank or whatnot, it's like, well, Okay. let's just throw RPA on it.Let's, you know, let's have a RPA do this thing, uh, in the way that a human does it, but it can do it 24 7. and an example, or use cases, you work at a bank and, uh, there's no way that InfoSec is going to let you query against this database with, your users that have this account or your customers that have this no way in any organization at a bank.Is InfoSec going to say, oh yeah. sure. Let me give you an Odata query, you know, driver on, you know, and you can just set up your own SQL queries and do whatever they're gonna say no way. In fact, how did you find out about this database in the first place and who are you.How do we solve it? We, we go and say, Okay. how does the user get in here well they open up a mainframe emulator on their desktop, which shows them the mainframe. And then they go in, they click here and they put this number in there, and then they look up this customer and then they switch this value to that value and they say, save.And it's like, okay. cool. That's that RPA can do. And we can do that quite easily. And we don't need to talk about APIs and we don't need to talk about special access or doing queries that makes, you know, Infosec very scared. you know, a great use case for that is, you know, a bank say they, they acquire, uh, a regional bank and they say, cool, you're now part of our bank, but in your systems that are now going to be a part of our systems, you have users that have this value, whereas in our bank, that value is this value here. So now we have to go and change for 30,000 customers this one field to make it line up with our systems. Traditionally you would get a, you know, extract, transform load tool an ETL tool to kind of do that. But for 30,000 customers that might be below the threshold, and this is banking. So it's very regulated and you have to be very, very. Intentional about how you manipulate and move data around.So what do we have to do? okay. We have to hire 10 contractors for six months, and literally what they're going to do eight hours a day is go into the mainframe through the simulator and customer by customer. They're going to go change this value and hit save. And they're looking at an Excel spreadsheet that tells them what customer to go into.And that's going to cost X amount of money and X, you know, for six months, or what we could do is just build a RPA solution, a bot, essentially that goes, and for each line of that Excel spreadsheet, it repeats this one process, open up mainframe emulator, navigate into the customer profile and then changes value, and then shut down and repeat.And It can do that in one week and, and can be built in two, that's the, the dream use case for RPA and that's really kind of, uh, where it would shine.[00:15:20] Jeremy: It sounds like the. best use case for it is an old system, a mainframe system, in COBOL maybe, uh, doesn't have an API. And so, uh, it makes sense to rather than go, okay, how can we get directly into the database?[00:15:38] Alex: How can we build on top of it? Yeah,[00:15:40] Jeremy: we build on top of it? Let's just go through the, user interface that exists, but just automate that process. And, the, you know, the example you gave, it sounds very, very well-defined you're gonna log in and you're going to put in maybe this ID, here's the fields you want to get back.and you're going to save those and you didn't have to make any real decisions, I suppose, in, in terms of, do I need to click this thing or this thing it's always going to be the same path to, to get there.[00:16:12] Alex: exactly. And that's really, you need to be disciplined about your use cases and what those look like. And you can broadly say a use case that I am going to accept has these features, and one of the best ways to do that is say it has to be a binary decision process, which means there is no, dynamic or interpreted decision that needs to, or information that needs to be made.Exactly like that use case it's very binary either is, or it isn't you go in you journey into there. and you change that one thing and that's it there's no oh, well this information says this, which means, and then I have to go do this. Once you start getting in those if else, uh, processes you're, you're going down a rabbit hole and it could get very shaky and that introduces extreme instability in what you're trying to do.And also really expands your development time cause you have to capture these processes and you have to say, okay. tell me exactly what we need to build this bot to do. And for, binary decision processes, that's easy go in here, do this, but nine times out of 10, as you're trying to address this and solution for it, you'll find those uncertainties.You'll find these things where the business says, oh, well, yeah. that happens, you know, one times out of 10 and this is what we need to do. And it's like, well, that's going to break the bot. It, you know, nine times out of 10, this, this spot is going to fall over. this is now where we start getting into, the machine learning and AI, realm.And why RPA, is classified. Uh, sometimes as a subset of the AI or machine learning field, or is a, a pattern within that field is because now that you have this bot or this software that enables you to do a human process, let's enable that bot to now do decision-making processes where it can interpret something and then do something else.Because while we can just do a big tree to kind of address every capability, you're never going to be able to do that. And also it's, it's just a really heavy, bad way to build things. So instead let's throw in some machine learning capability where it just can understand what to do and that's, you know, that's the next level of RPA application is Okay. we've got it. We've, we've gone throughout our organization. We found every kind of binary thing, that can be replaced with an RPA bot. Okay.Now what are the ones that we said we couldn't do? Because it had some of that decision-making that, required too much of a dynamic, uh, intelligence behind it. And let's see if we can address those now that we have this. And so that's, that's the 2.0, in RPA is addressing those non-binary, paths. I would argue that especially in organizations that are big enough to justify bringing in an RPA solution to solve for their processes. They have enough binary processes, binary decision processes to keep them busy.Some people, kind of get caught up in trying to right out the gate, say, we need to throw some machine learning. We need to make these bots really capable instead of just saying, well, we we've got plenty of work, just changing the binary processes or addressing those. Let's just be disciplined and take that, approach.Uh, I will say towards RPA and bots, the best solution or the only solution. When you talk about building a bot is the one that you eventually turn off. So you can say, I built a bot that will go into our mainframe system and update this value. And, uh, that's successful.I would argue that's not successful. When that bot is successful is when you can turn it off because there's an enterprise solution that addresses it. and, and you don't have to have this RPA bot that lives over here and does it instead, you're enterprise, capability now affords for it. And so that's really, I think a successful bot or a successful RPA solution is you've been able to take away the pain point or that human process until it can be correctly addressed by your systems that everyone uses. [00:21:01] Jeremy: from, the business perspective, you know, what are some of the limitations or long-term problems with, with leaving an RPA solution in place?[00:21:12] Alex: that's a, that's a good question. Uh, from the business there, isn't, it's solved for. leaving it in place is other than just servicing it and supporting it. There's no real issue there, especially if it's an internal system, like a mainframe, you guys own that. If it changes, you'll know it, if it changes it's probably being fixed or addressed.So there's no, problem. However, That's not the only application for RPA. let's talk about another use case here, your organization, uses, a bank and you don't have an internal way to communicate it. Your user literally has to go to the bank's website, log in and see information that the bank is saying, Hey, this is your stuff, right?The bank doesn't have an API for their, that service. because that would be scary for the bank. They say, we don't want to expose this to another service. So the human has to go in there, log in, look at maybe a PDF and download it and say, oh, Okay.So that is happens in a browser. So it's a newer technology.This isn't our mainframe built in 1980. You know, browser based it's in the internet and all that, but that's still a valid RPA application, right? It's a human process. There's no API, there's no easy programmatic way to, to solution for it. It would require the bank and your it team to get together and, you know, hate each other. Think about why this, this is so hard. So let's just throw a bot on it. That's going to go and log in, download this thing from the bank's website and then send it over to someone else. And it's going to do that all day. Every day. That's a valid application. And then tomorrow the bank changes its logo. And now my bot is it's confused.Stuff has shifted on the page. It doesn't know where to click anymore. So you have to go in and update that bot because sure enough, that bank's not going to send out an email to you and saying, Hey, by the way, we're upgrading our website in two weeks. Not going to happen, you'll know after it's happened.So that's where you're going to have to upgrade the bot. and that's the indefinite use of RPA is going to have to keep until someone else decides to upgrade their systems and provide for a programmatic solution that is completely outside the, uh, capability of the organization to change. And so that's where the business would say, we need this indefinitely.It's not up to us. And so that is an indefinite solution that would be valid. Right? You can keep that going for 10 years as long, I would say you probably need to get a bank that maybe meets your business needs a little easier, but it's valid. And that would be a good way for the business to say yes, this needs to keep running forever until it doesn't.[00:24:01] Jeremy: you, you brought up the case of where the webpage changes and the bot doesn't work anymore. specifically, you're, you're giving the example of finance and I feel like it would be basically catastrophic if the bot is moving money to somewhere, it shouldn't be moving because the UI has moved around or the buttons not where it expects it to be.And I'm kind of curious what your experience has been with that sort of thing.[00:24:27] Alex: you need to set organizational thresholds and say, this is this something this impacting or something that could go this wrong. It is not acceptable for us to solve with RPA, even though we could do it, it's just not worth it. Some organizations say that's anything that touches customer data healthcare and banking specialists say, yeah, we have a human process where the human will go and issue refunds to a customer, uh, and that could easily be done via RPA solution, but it's fraught with, what, if it does something wrong, it's literally going to impact.Uh, someone somewhere they're their moneys or their, their security or something like that. So that, that definitely should be part of your evaluation. And, um, as an organization, you should set that up early and stick to it and say, Nope, this is outside our purview. Even we can do it. It has these things.So I guess the answer to that is you should never get to that process, but now we're going to talk about, I guess, the actual nuts and bolts of how RPA solutions work and how they can be made to not action upon stuff when it changes or if it does so RPA software, by and large operates by exposing the operating system or the browsers underlying models and interpreting them.Right. So when we talk about something like a, mainframe emulator, you have your RPA software on Microsoft windows. It's going to use the COM the component operating model, to see what is on the screen, what is on that emulator, and it's gonna expose those objects. to the software and say, you can pick these things and click on that and do that.when we're talking about browser, what the RPA software is looking at is not only the COM the, the component object model there, which is the browser, itself. But then it's also looking at the DOM the document object model that is the webpage that is being served through the browser. And it's exposing that and saying, these are the things that you can touch or, operate on.And so when you're building your bots, what you want to make sure is that the uniqueness of the thing that you're trying to access is something that is truly unique. And if it changes that one thing that the bot is looking for will not change. So we let's, let's go back to the, the banking website, right?We go in and we launch the browser and the bot is sitting there waiting for the operating system to say, this process is running, which is what you wanted to launch. And it is in this state, you know, the bot says, okay. I'm expecting this kind of COM to exist. I see it does exist. It's this process, and it has this kind of name and cool Chrome is running. Okay. Let's go to this website. And after I've typed this in, I'm going to wait and look at the DOM and wait for it to return this expected a webpage name, but they could change their webpage name, the title of it, right. They can say, one day can say, hello, welcome to this bank. And the next day it says bank website, all of a sudden your bot breaks it no longer is finding what it was told to expect.So you want to find something unique that will never change with that conceivably. And so you find that one thing on the DOM on the banking website, it's, you know, this element or this tag said, okay, there's no way they're changing that. And so it says cool the page is loaded. Now click on this field, which is log in.Okay. You want to find something unique on that field that won't change when they upgrade, you know, from bootstrap to this kind of, you know, UI framework. that's all well, and good. That's what we call the happy path. It's doing this perfectly. Now you need to define what it should do when it doesn't find these things, which is not keep going or find similar it's it needs to fail fast and gracefully and pass on that failure to someone and not keep going. And that's kind of how we prevent that scary use case where it's like. okay. it's gone in, it's logged into the bank website now it's transactioning, bad things to bad places that we didn't program it for it, Well you unfortunately did not specify in a detailed enough way what it needs to look for.And if it doesn't find that it needs to break, instead of saying that this is close enough. And so, in all things, software engineering, it's that specificity, it's that detail, that you need to hook onto. And that's also where, when we talk about this being a low-code no-code solutions that sometimes RPA is marketed to the business.It's just so often not the case, because yes. It might provide a very user, business, friendly interface for you to build bots. But the knowledge you need to be able to ensure stability and accuracy, um, to build the bots is, is a familiarity that's probably not going to be had in the business. It's going to be had by a developer who knows what the DOM and COM are and how the operating system exposes services and processes and how.JavaScript, especially when we're talking about single page apps and react where you do have this very reactive DOM, that's going to change. You need to be fluent with that and know, not only how HTML tags work and how CSS will change stuff on you in classes, but also how clicking on something on a single page app is as simple as a username input field will dynamically change that whole DOM and you need to account for it. so, it is it's, traditionally not as easy as saying, oh, the business person can just click, click, click, click, and then we have a bot. You'll have a bot, but it's probably going to be break breaking quite often. It's going to be inaccurate in its execution.this is a business friendly user-friendly non-technical tool. And I launch it and it says, what do you want to do? And it says, let me record what you're going to do. And you say, cool.And then you go about you open up Chrome and you type in the browser, and then you click here, click there, hit send, and then you stop recording. The tool says, cool, this is what you've done. Well, I have yet to see a, a solution that is that isn't able to not need further direction or, or defining on that process, You still should need to go in there and say, okay, yeah.you recorded this correctly, but you know, you're not interpreting correctly or as accurate as you need to that field that I clicked on.And if you know, anybody hits, you know, F12 on their keyboard while they have Chrome open and they see how the DOM is built, especially if this is using kind of any kind of template, Webpage software. It's going to have a lot of cruft in that HTML. So while yes, the recording did correctly see that you clicked on the input box.What it's actually seen is that you actually clicked on the div. That is four levels scoped above it, whereas the parent, and there are other things within that as well. And so the software could be correctly clicking on that later, but other things could be in there and you're going to get some instability.So the human or the business, um, bot builder, the roboticist, I guess, would need to say, okay, listen, we need to pare this down, but it's, it's even beyond that. There are concepts that you can't get around when building bots that are unique to software engineering as a concept. And even though they're very basic, it's still sometimes hard for the business user to, they felt to learn that.And I I'm talking concepts as simple as for loops or loops in general where the business of course has, has knowledge of what we would call a loop, but they wouldn't call it a loop and it's not as accurately defined. So they have to learn that. And it's not as easy as just saying, oh Yeah.do a loop. And the business will say, well, what's a loop.Like I know, you know, conceptually what a loop could be like a loop in my, when I'm tying my shoe. But when you're talking about loop, that's a very specific thing in software and what you can do. And when you shouldn't do it, and that's something that these, no matter how good your low code, no code solution might be, it's going to have to afford for that concept.And so a business user is still going to have to have some lower level capability to apply those concepts. And, and I I've yet to see anybody be able to get around that in their RPA solutions.[00:33:42] Jeremy: So in your experience, even though these vendors may sell it as being a tool that anybody can just sit down and use but then you would want a developer to, to sit with them or, or see the result and, and try and figure out, okay, what do you, what do you really want this, this code to do?Um, not just sort of these broad strokes that you were hoping the tool was gonna take care of for you? Yeah.[00:34:06] Alex: that that's exactly right. And that's how every organization will come to that realization pretty quickly. the head of the game ones have said, okay, we need to have a really good, um, COE structure to this robotic operating model where we can have, a software engineering, developer capability that sits with the business, capability.And they can, marry with each other, other businesses who may take, um, these vendors at their word and say, it's a low code meant for business. It just needs to make sure it's on and accessible. And then our business people are just gonna, uh, go in there and do this. They find out pretty quickly that they need some technical, um, guidance to go in because they're building unstable or inaccurate bots.and whether they come to that sooner or later, they, they always come to that. Um, and they realize that, okay, there there's a technical capability And, this is not just RPA. This is the story of all low-code no-code solutions that have ever existed. It always comes around that, while this is a great interface for doing that, and it's very easy and it makes concepts easy.Every single time, there is a technical capability that needs to be afforded. [00:35:26] Jeremy: For the. The web browser, you mentioned the DOM, which is how we typically interact with applications there. But for native applications, you, you briefly mentioned, COM. And I was wondering when someone is writing, um, you know, a bot, uh, what are the sorts of things they see, or what are the primitives they're working with?Like, is there a name attached to each button, each text, field, [00:35:54] Alex: wouldn't that be a great world to live in, so there's not. And, and, as we build things in the DOM. People get a lot better. We've seen people are getting much better about using uniqueness when they build those things so that they can latch onto when things were built or built for the COM or, you know, a .NET for OS that might, that was not no one no one was like oh yeah, we're going to automate this.Or, you know, we need to make this so that this button here is going to be unique from that button over there on the COM they didn't care, you know, different name. Um, so yeah, that is, that is sometimes a big issue when you're using, uh, an RPA solution, you say, okay. cool. Look at this, your calculator app. And Okay. it's showing me the component object model that this was built. It that's describing what is looking at, but none of these nodes have, have a name. They're all, you know, node one node, 1.1 node two, or, or whatnot, or button is just button and there's no uniqueness around it. And that is, you see a lot of that in legacy older software, um, E legacy is things built in 2005, 2010.Um, you do see that, and that's the difficulty at that point. You can still solve for this, but what you're doing is you're using send keys. So instead of saying, Okay.RPA software, open up this, uh, application and then look for. You know, thing, this object in the COM and click on it, it's going to, you know, it can't, there is no uniqueness.So what you say is just open up the software and then just hit tab three times, and that should get you to this one place that was not unique, but we know if you hit tab three times, it's going to get there now. That's all well and good, but there's so many things that could interfere with that and break it.And the there's no context for the bot to grab onto, to verify, Okay. I am there. So any one thing, you could have a pop-up which essentially hijacks your send key, right? And so the bot yes, absolutely hit tab three times and it should be in that one place. It thinks it is, and it hits in enter, but in between the first and second tab, a pop-up happened and now it's latched onto this other process, hits enter. And all of a sudden outlook's opening bot doesn't know that, but it's still going on and it's going to enter in some financial information into oops, an email that it launched because it thought hitting enter again would do so. Yeah.That's, that's where you get that instability. Um, there are other ways around it or other solutions.and this is where we get into the you're using, um, lower level software engineering solutioning instead of doing it exactly how the user does it. When we're talking about the operating system and windows, there are a ton of interop services and assemblies that a, uh, RPA solution can access.So instead of cracking open Excel, double-clicking on Excel workbook waiting for it to load, and then reading the information and putting information in, you can use the, you know, the office 365 or whatnot that, um, interop service assembly and say, Hey, launch this workbook without the UI, showing it, attach to that process that, you know, it is.And then just send to it, using that assembly send information into it. And the human user can't do that. It can't manipulate stuff like that, but the bot can, and it it's the same end as the human users trying. And it's much more efficient and stable because the UI couldn't afford for that kind of stability.So that would be a valid solution. But at that point, you're really migrating into a software engineering, it developer solution of something that you were trying not to do that for. So when is that? Why, you know, why not just go and solve for it with an enterprise or programmatic solution in the first place?So that's the balance. [00:40:18] Jeremy: earlier you're talking about the RPA needs to be something that, uh, that the person is able to do. And it sounds like in this case, I guess there still is a way for the person to do it. They can open up the, the Excel sheet and right it's just that the way the, the RPA tool is doing it is different. Yeah.[00:40:38] Alex: Right. And more efficient and more stable. Certainly. Uh, especially when we're talking about Excel, you have an Excel with, you know, 200,000 lines, just opening that that's, that's your day, that's going to Excel it, just going to take its time opening and visualizing that information for you. Whereas you, you know, an RPA solution doesn't even need to crack that open.Uh, it can just send data right directly to that workbook and it that's a valid solution. And again, some of these processes, it might be just two people at your organization that are essentially doing it. So it's, you know, you don't really, it's not at a threshold where you need an enterprise solution for it, but they're spending 30 minutes of their day just waiting for that Excel workbook to open and then manipulating the data and saving it.And then, oh, their computer crashed. So you can do an RPA solution. It's going to be, um, to essentially build for a more efficient way of doing it. And that would be using the programmatic solution, but you're right. It is doing it in a way that a human could not achieve it. Um, and that again is. The where the discipline and the organizational, aspect of this comes in where it's saying, is that acceptable?Is it okay to have it do things in this way, that are not human, but achieving the same ends. And if you're not disciplined that creeps, and all of a sudden you have a RPA solution that is doing things in a way that where the whole reason to bring that RPA solution is to not have something that did something like that. And that's usually where the stuff falls apart. IT all of a sudden perks their head up and says, wait, I have a lot of connections coming in from this one computer doing stuff very quickly with a, you know, a SQL query. It's like, what is going on? And so all of a sudden, someone built a bot to essentially do a programmatic connection.And it is like, you should not be who gave you this permissions who did this shut down everything that is RPA here until we figure out what you guys went and did. So that's, that's the dance. [00:42:55] Jeremy: it's almost like there's this hidden API or there's this API that you're not intended to use. but in the process of trying to automate this thing, you, you use it and then if your, IT is not aware of it, then things just kind of spiral out of control.[00:43:10] Alex: Exactly. Right. So let's, you know, a use case of that would be, um, we need to get California tax information on alcohol sales. We need to see what each county taxes for alcohol to apply to something. And so today the human users, they go into the California, you know, tobacco, wildlife, whatever website, and they go look up stuff and okay, let's, that's, that's very arduous.Let's throw a bot on that. Let's have a bot do that. Well, the bot developers, smart person knows their way around Google and they find out, well, California has an API for that. instead of the bot cracking open Chrome, it's just going to send this rest API call and it's going to get information back and that's awesome and accurate and way better than anything. but now all of a sudden IT sees connections going in and out. all of a sudden it's doing very quickly and it's getting information coming into your systems in a way that you did not know was going to be, uh, happening. And so while it was all well and good, it's, it's a good way for, the people whose job it is to protect yourself or know about these things, to get very, um, angry, rightly so that this is happening.that's an organizational challenge, uh, and it's an oversight challenge and it's a, it's a developer challenge because, what you're getting into is the problems with having too technical people build these RPA bots, right? So on one hand we have business people who are told, Hey, just crack this thing open and build it.And it's like, well, they don't have enough technical fluency to actually build a stable bot because they're just taking it at face value. Um, on the other hand, you have software engineers or developers that are very technical that say, oh, this process. Yeah. Okay. I can build a bot for that. But what if I used, you know, these interop services, assemblies that Microsoft gives me and I can access it like that.And then I can send an API call over here. And while I'm at it, I'm going to, you know, I'm just going to spin up a server just on this one computer that can do this. When the bot talks to it. And so you have the opposite problem. Now you have something that is just not at all RPA, it's just using the tool to, uh, you know, manipulate stuff, programmatically.[00:45:35] Jeremy: so, as a part of all this, is using the same credentials as a real user, right. You're you're logging in with a username and password. if the form requires something like two factor authentication or, you know, or something like that, like, how does that work since it's not an actual person?[00:45:55] Alex: Right. So in a perfect world, you're correct. Um, a bot is a user. I know a lot of times you'll hear, say, people will be like, oh, hi, I have 20 RPA bots. What they're usually saying is I have 20 automations that are being run for separate processes, with one user's credentials, uh, on a VDI. So you're right.They, they are using a user's credentials with the same permissions that any user that does that process has, that's why it's easy. but now we have these concepts, like two factor authentication, which every organization is using that should require something that exists outside of that bot users environment. And so how do you afford for that in a perfect world? It would be a service account, not a user account and service accounts are governed a little differently. A lot of times service accounts, um, have much more stringent rules, but also allow for things like password resets, not a thing, um, or two factor authentication is not a thing for those.So that would be the perfect solution, but now you're dragging in IT. Um, so, you know, if you're not structurally set up for that, that's going to be a long slog. Uh, so what would want to do some people actually literally have a, we'll have a business person that has their two factor auth for that bot user on their phone.And then just, you know, they'll just go in and say, yeah.that's me. that's untenable. So, um, sometimes what a lot of these, like Microsoft, for instance, allow you to do is to install a two factor authentication, application, um, on your desktop so that when you go to log in a website and says, Hey, type in your password.Cool. Okay. Give me that code. That's on your two factor auth app. The bot can actually launch that. Copy the code and paste it in there and be on its way. But you're right now, you're having to afford for things that aren't really part of the process you're trying to automate. They are the incidentals that also happen.And so you have to build your bot to afford for those things and interpret, oh, I need to do two factor authentication. And a lot of times, especially if you have an entirely business focused PA um, robotic operating model, they will forget about those things or find ways around them that the bot isn't addressing, like having that authenticator app on their phone.that's, um, stuff that definitely needs to be addressed. And sometimes is only, found at runtime like, oh, it's asking for login. And when I developed it, I didn't need to do that because I had, you know, the cookie that said you're good for 30 days, but now, oh, no. [00:48:47] Jeremy: yeah. You could have two factor. Um, you could have, it asking you to check your email for a code. There could be a fraud warning. There's like all sorts of, you know, failure cases that can happen. [00:48:58] Alex: exactly. And those things are when we talk about, uh, third-party vendors, um, third-party provider vendors, like going back to the banking website, if you don't tell them that you're going to be using a bot to get their information or to interface with that website, you're setting yourself up for a bad time because they're going to see that kind of at runtime behavior that is not possible at scale by user.And so you run into that issue at runtime, but then you're correct. There are other things that you might run into at runtime that are not again, part of the process, the business didn't think that that was part of the process. It's just something they do that actually the bot has to afford for. that's part of the journey, uh, in building these. [00:49:57] Jeremy: when you're, when you're building these, these bots, what are the types of tools that, that you've used in the past? Are these commercial, packages, are these open source? Like what, what does that ecosystem look like?[00:50:11] Alex: Yeah, in this space, we have three big ones, which is, uh, automation anywhere UI path and, blue prism. Those are the RPA juggernauts providing this software to the companies that need it. And then you have smaller ones that are, trying to get in there, or provide stuff in a little different way. and you even have now, big juggernauts that are trying to provide for it, like Microsoft with something like power automate desktop.So all of these, say three years ago, all of these softwares existed or all of these RPA solution softwares existed or operated in the same kind of way, where you would install it on your desktop. And it would provide you a studio to either record or define, uh, originally the process that was going to be automated on that desktop when you pushed play and they all kind of expose or operate in the same way they would interpret the COM or the DOM that the operating system provided. Things like task scheduler have traditionally, uh, exposed, uh, and they all kind of did that in the same way. Their value proposition in their software was the orchestration capability and the management of that.So I build a bot to do this, Jim over there built a bot to do that. Okay. This RPA software, not only enabled you to define those processes, But what their real value was is they enable a place where I can say this needs to run at this time on this computer.And it needs to, you know, I need to be able to monitor it and it needs to return information and all that kind of orchestration capability. Now all of these RPA solutions actually exist in that, like everything else in the browser. So instead of installing, you know, the application and launching it and, and whatnot, and the orchestration capability being installed on another computer that looked at these computers and ran stuff on them.Now it's, it's all in the cloud as it were, and they are in the browser. So I go to. Wherever my RPA solution is in my browser. And then it says, okay, cool. You, you still need to install something on the desktop where you want the spot to run and it deploys it there. But I define and build my process in the provided browser studio.And then we're going to give you a capability to orchestrate, monitor, and, uh, receive information on those things that you have, those bots that you have running, and then what they're now providing as well is the ability to tie in other services to your bot so that it has expanded capability. So I'm using automation anywhere and I built my bot and it's going, and it's doing this or that.And automation anywhere says, Hey, that's cool. Wouldn't you like your bot to be able to do OCR? Well, we don't have our own OCR engine, but you probably as an enterprise do. Just use, you know, use your Kofax OCR engine or Hey, if you're really a high speed, why don't you use your Azure cognitive services capability?We'll tie it right into our software. And so when you're building your bot, instead of just cracking open a PDF and send key control C and key control V to do stuff instead, we'll use your OCR engine that you've already paid for to, to understand stuff. And so that's, how they expand, what they're offering, um, into addressing more and more capabilities.[00:53:57] Alex: But now we're, we're migrating into a territory where it's like, well, things have APIs why even build a bot for them. You know, you can just build a program that uses the API and the user can drive this. And so that's where people kind of get stuck. It's they they're using RPA on a, something that just as easily provides for a programmatic solution as opposed to an RPA solution.but because they're in their RPA mode and they say, we can use a bot for everything, they don't even stop and investigate and say, Hey, wouldn't this be just as easy to generate a react app and let a user use this because it has an API and IT can just as easily monitor and support that because it's in an Azure resource bucket.that's where an organization needs to be. Clear-eyed and say, Okay. at this point RPA is not the actual solution. We can do this just as easy over here and let's pursue that. [00:54:57] Jeremy: the experience of making these RPAs. It sounds like you have this browser-based IDE, there's probably some kind of drag and drop set up, and then you, you, you mentioned JavaScript. So I suppose, does that mean you can kind of dive a little bit deeper and if you want to set up specific rules or loops, you're actually writing that in JavaScript.[00:55:18] Alex: Yeah. So not, not necessarily. So, again, the business does not know what an IDE is. It's a studio. Um,so that's, but you're correct. It's, it's an IDE. Um, each, whether we're talking about blue prism or UiPath or automation anywhere, they all have a different flavor of what that looks like and what they enable.Um, traditionally blue prism gave you, uh, a studio that was more shape based where you are using UML shapes to define or describe your process. And then there you are, whereas automation anywhere traditionally used, uh, essentially lines or descriptors. So I say, Hey, I want to open this file. And your studio would just show a line that said open file.You know, um, although they do now, all of them have a shape based way to define your process. Go here, here. You know, here's a circle which represents this. Uh, let's do that. Um, or a way for you to kind of more, um, creatively define it in a, like a text-based way. When we talk about Java script, um, or anything like that, they provide predefined actions, all of them saying, I want to open a file or execute this that you can do, but all of them as well, at least last time I checked also allow you for a way to say, I want to programmatically run something I want to define.And since they're all in the browser, it is, uh, you know, Javascript that you're going to be saying, Hey, run this, this JavaScript, run this function. Um, previously, uh, things like automation anywhere would, uh, let you write stuff in, in .NET essentially to do that capability. But again, now everything's in the browser.So yeah, they do, They do provide for a capability to introduce more low level capability to your automation. That can get dangerous. Uh, it can be powerful and it can be stabilizing, but it can be a very slippery slope where you have an RPA solution bot that does the thing. But really all it does is it starts up and then executes code that you built.[00:57:39] Alex: Like what, what was the, the point in the first place? [00:57:43] Jeremy: Yeah. And I suppose at that point, then anybody who knows how to use the RPA tool, but isn't familiar with that code you wrote, they're just, they can't maintain it [00:57:54] Alex: you have business continuity and this goes back to our, it has to be replicable or close as close to the human process, as you can make. Because that's going to be the easiest to inherit and support. That's one of the great things about it. Whereas if you're a low level programmer, a dev who says, I can easily do this with a couple of lines of, you know, dot net or, you know, TypeScript or whatever.And so the bot just starts up in executes. Well, unless someone that is just as proficient comes along later and says, this is why it's breaking you now have an unsupportable business, solution. that's bad Juju. [00:58:38] Jeremy: you have the software engineers who they want to write code. then you have the people who are either in business or in IT that go, I don't want to look at your code.I don't want to have to maintain it. Yeah. So it's like you almost, if you're a software engineer coming in, you almost have to fight that urge to, to write anything yourself and figure out, okay, what can I do with the tool set and only go to code if I can't do it any other way.[00:59:07] Alex: That's correct. And that's the, it takes discipline. more often than not, not as fun as writing the code where you're like, I can do this. And this is really where the wheels come off is. You went to the business that is that I have this process, very simple. I need to do this and you say, cool, I can do that.And then you're sitting there writing code and you're like, but you know what? I know what they really want to do. And I can write that now. And so you've changed the process and while it is, and nine times out of 10, the business will be like, oh, that's actually what we wanted. The human process was just as close as we could get nothing else, but you're right.That's, that's exactly what we needed. Thank you nine times out of 10. They'll love you for that. But now you own their process. Now you're the one that defined it. You have to do the business continuity. You have to document it. And when it falls over, you have to pick it back up and you have to retrain.And unless you have an organizational capacity to say, okay, I've gone in and changed your process. I didn't automate it. I changed it. Now I have to go in and tell you how I changed it and how you can do it. And so that, unless you have built your robotic operating model and your, your team to afford for that, your developer could be writing checks bigger than they can cash.Even though this is a better capability. [01:00:30] Jeremy: you, you sort of touched on this before, and I think this is probably the, the last topic we'll cover, but you've been saying how the end goal should be to not have to use the RPAs anymore And I wonder if you have any advice for how to approach that process and, and what are some of the mistakes you've seen people make[01:00:54] Alex: Mm Hmm. I mean the biggest mistake I've seen organizations make, I think is throwing the RPA solution out there, building bots, and they're great bots, and they are creating that value. They're enabling you to save money and also, enabling your employees to go on and do better, more gratifying work. but then they say, that's, it that's as far as we're going to think, instead of taking those savings and saying, this is for replacing this pain point that we had to get a bot in the first place to do so.That's a huge common mistake. Absolutely understandable if I'm a CEO or even, you know, the person in charge of, you know, um, enterprise transformation. Um, it's very easy for me to say, ha victory, here's our money, here's our savings. I justified what we've done. Go have fun. Um, and instead of saying, we need to squirrel this money away and give it to the people that are going to change the system. So that, that's definitely one of the biggest things.The problem with that is that's not realized until years later when they're like, oh, we're still supporting these bots. So it is upfront having a turnoff strategy. When can we turn this bot off? What is that going to look like? Does it have a roadmap that will eventually do that?And that I think is the best way. And that will define what kind of processes you do indeed build bots for is you go to it and say, listen, we've got a lot of these user processes, human processes that are doing this stuff. Is there anything on your roadmap that is going to replace that and they say, oh yeah you know, in three years we're actually going to be standing up our new thing.We're going to be converting. And part of our, uh, analysis of the solution that we will eventually stand up will be, does it do these things? And so yes, in three years, you're good. And you say, cool, those are the processes I'm going to automate and we can shut those off. That's your point of entry for these things not doing that leads to bots running and doing things even after there is a enterprise solution for that. And more often than not, I would say greater than five times out of 10, when we are evaluating a process to build a bot for easily five times out of 10, we say, whoa, no, actually there's, you don't even need to do this.Our enterprise application can do this. you just need retraining, because your process is just old and no one knew you were doing this. And so they didn't come in and tell you, Hey, you need to use this.So that's really a lot of times what, what the issue is. And then after that, we go in and say, Okay.no, there's, there's no solution for this. This is definitely a bot needs to do this. Let's make sure number one, that there isn't a solution on the horizon six months to a year, because otherwise we're just going to waste time, but let's make sure there is, or at least IT, or the people in charge are aware that this is something that needs to be replaced bot or no bot.And so let's have an exit strategy. Let's have a turn-off strategy. When you have applications that are relatively modern, like you have a JIRA, a ServiceNow, you know, they must have some sort of API and it may just be that nobody has come in and told them, you just need to plug these applications together.[01:04:27] Alex: And so kind of what you're hitting on and surfacing is the future of RPA. Whereas everything we're talking about is using a bot to essentially bridge a gap, moving data from here to there that can't be done, programmatically. Accessing something from here to there that can't be done programmatically.So we use a bot to do it. That's only going to exist for so long. Legacy can only be legacy for so long, although you can conceivably because we had that big COBOL thing, um, maybe longer than we we'd all like, but eventually these things will be. upgraded. and so either the RPA market will get smaller because there's less legacy out there.And so RPA as a tool and a solution will become much more targeted towards specific systems or we expand what RPA is and what it can afford for. And so that I think is more likely the case. And that's the future where bots or automations aren't necessary interpreting the COM and the DOM and saying, okay, click here do that.But rather you're able to quickly build bots that utilize APIs that are built in and friendly. And so what we're talking about there is things like Appian or MuleSoft, which are these kind of API integrators are eventually going to be classified as RPA. They're going to be within this realm.And I think, where, where you're seeing that at least surfaced or moving towards is really what Microsoft's offering in that, where they, uh, they have something called power automate, which essentially is it just a very user-friendly way to access API. that they built or other people have built.So I want to go and I need to get information to service now, service now has an API. Yeah. Your, IT can go in and build you a nice little app that does a little restful call to it, or a rest API call to it gets information back, or you can go in and, you know, use Microsoft power automate and say, okay, I want to access service now.And it says, cool. These are the things you can do. And I say, okay, I just want to put information in this ticket and we're not talking about get or patch or put, uh, or anything like that. We're just saying, ah, that's what it's going to do. And that's kind of what Microsoft is, is offering. I think that is the new state of RPA is being able to interface in a user-friendly way with APIs. Cause everything's in the browser to the point. where, you know, Microsoft's enabling add ins for Excel to be written in JavaScript, which is just the new frontier. Um, so that's, that's kind of going to be the future state of this. I believe. [01:07:28] Jeremy: so, so moving from RPAs being this thing, that's gonna click through website, click through, um, a desktop application instead it's maybe more of this high, higher level tool where the user will still get this, I forget the term you used, but this tool to build a workflow, right. A studio. Okay. Um, and instead of saying, oh, I want this to click this button or fill in this form.It'll be, um, I want to get this information from service now. And I want to send a message using that information to slack or to Twilio, or, um, you're basically, talking directly to these different services and just telling it what you want and where it should go.[01:08:14] Alex: That's correct. So, as you said, everything's going to have an API, right? Seemingly everything has an API. And so instead of us, our RPA bots or solutions being UI focused, they're going to be API focused, um, where it doesn't have to use the user interface. It's going to use the other service. And again, the cool thing about APIs in that way is that it's not, directly connecting to your data source.It's the same as your UI for a user. It sits on top of it. It gets the request and it correctly interprets that. And does it the same thing with your UI where I say I click here and you know, wherever it says. okay. yeah. You're allowed to do that. Go ahead. So that's kind of that the benefit to that.Um, but to your point, the, the user experience for whether you're using a UI or API to build up RPA bot, it's going to be the same experience for the user. And then at this point, what we're talking about, well, where's the value offering or what is the value proposition of RPA and that's orchestration and monitoring and data essentially.we'll take care of hosting these for you. we'll take care of where they're going to run, uh, giving you a dashboard, things like that.[01:09:37] Alex: That's a hundred percent correct. It's it's providing a view into that thing and letting the business say, I want to no code this. And I want to be able to just go in and understand and say, oh, I do want to do that. I'm going to put these things together and it's going to automate this business process that I hate, but is vital, and I'm going to save it, the RPA software enables you to say, oh, I saw they did that. And I see it's running and everything's okay in the world and I want to turn it on or off. And so it's that seamless kind of capability that that's what that will provide. And I think that's really where it isn't, but really where it's going. Uh, it'll be interesting to see when the RPA providers switch to that kind of language because currently and traditionally they've gone to business and said, we can build you bots or no, no, your, your users can build bots and that's the value proposition they can go in.And instead of writing an Excel where you had one very, very advanced user. Building macros into Excel with VBA and they're unknown to the, the IT or anybody else instead, you know, build a bot for it. And so that's their business proposition today. Instead, it's going to shift, and I'd be interested to see when it shifts where they say, listen, we can provide you a view into those solutions and you can orchestrate them in, oh, here's the studio that enables people to build them.But really what you want to do is give that to your IT and just say, Hey, we're going to go over here and address business needs and build them. But don't worry. You'll be able to monitor them and at least say, yeah okay. this is, this is going.[01:11:16] Jeremy: Yeah. And that's a, a shift. It sounds like where RPA is currently, you were talking about how, when you're configuring them to click on websites and GUIs, you really do still need someone with the software expertise to know what's going on. but maybe when you move over to communicating with API, Um, maybe that won't be as important maybe, somebody who just knows the business process really can just use that studio and get what they need.[01:11:48] Alex: that's correct. Right. Cause the API only enables you to do what it defined right. So service now, which does have a robust API, it says you can do these things the same as a user can only click a button that's there that you've built and said they can click. And so that is you can't go off the reservation as easy with that stuff, really what's going to become prime or important is as no longer do I actually have an Oracle server physically in my location with a database.Instead I'm using Oracle's cloud capability, which exists on their own thing. That's where I'm getting data from. What becomes important about being able to monitor these is not necessarily like, oh, is it falling over? Is it breaking? It's saying, what information are you sending or getting from these things that are not within our walled garden.And that's really where, it or the P InfoSec is, is going to be maybe the main orchestrator owner of RPA, because they're, they're going to be the ones to say you can't, you can't get that. You're not allowed to get that information. It's not necessarily that you can't do it. Um, and you can't do it in a dangerous way, but it's rather, I don't want you
“Understanding what makes code readable from a cognitive perspective will help you design better. There are so many areas of programming where knowing something about knowing is just going to make you happier and more effective." Felienne Hermans is the author of “The Programmer's Brain” and an Associate Professor at Leiden University. She is also the creator of the Hedy programming language, the co-founder of Joy of Coding conference, and a host at Software Engineering Radio podcast. In this episode, Felienne explained why programming is one of the most demanding cognitive activities and described the three different cognitive processes involved. We discussed why code reading is hard and how to get better at it, the connection between programming and spoken languages, naming things and why it is so important to get it right, and how to avoid having bugs in our thinking. Listen out for: Career Journey - [00:05:09] Kids Learning Programming - [00:06:15] Writing “The Programmer's Brain” - [00:08:58] Programming as a Demanding Cognitive Ability - [00:11:19] Code Reading is So Hard- [00:16:23] 3 Cognitive Processes - [00:19:32] How to Improve Code Reading Skills - [00:22:09] Power of Chunking - [00:25:07] Learning Programming and Spoken Language - [00:27:35] Bugs in Thinking - [00:31:02] Naming Things is Hard - [00:34:32] Code with Bad Names Has More Bugs - [00:37:36] Mental Models - [00:41:31] Other Cognitive Aspects - [00:42:45] Impact of Interruptions - [00:44:37] 2 Tech Lead Wisdom - [00:46:46] _____ Felienne Hermans's Bio Felienne Hermans is an Associate Professor at the Leiden Institute of Advanced Computer Science at Leiden University, where she heads the PERL research group, focused on programming education. She also teaches prospective computer science teachers at the Vrije Universiteit Amsterdam. Felienne is the creator of the Hedy programming language, and was one of the founders of the Joy of Coding conference. Since 2016, she has been a host at Software Engineering Radio, one of the most popular software engineering podcasts on the web. Felienne is also the author of “The Programmer's Brain” a book that helps programmers understand how their brains work and how to use it more effectively. In 2021, Felienne was awarded the Dutch Prize for ICT research. Felienne is a member the board of I&I, the Dutch association of high-school computer science teachers, and of TC39, the committee that designs JavaScript. Follow Felienne: Website – https://www.felienne.com/ Twitter – @Felienne LinkedIn – https://www.linkedin.com/in/felienne Our Sponsor Are you looking for a new cool swag? Tech Lead Journal now offers you some swags that you can purchase online. These swags are printed on-demand based on your preference, and will be delivered safely to you all over the world where shipping is available. Check out all the cool swags by visiting https://techleadjournal.dev/shop. Like this episode? Subscribe on your favorite podcast app and submit your feedback. Follow @techleadjournal on LinkedIn, Twitter, and Instagram. Pledge your support by becoming a patron. For more info about the episode (including quotes and transcript), visit techleadjournal.dev/episodes/61.
Josef Strzibny is the author of Deployment from Scratch and a current Fedora contributor. He previously worked on the Developer Experience team at Red Hat.This episode originally aired on Software Engineering Radio.Links: Deployment from Scratch @strzibnyj systemd Introduction to Control Groups SELinux Fedora Rocky Linux Puma AppSignal Datadog Rollbar Skylight Bootstrapping a multiplayer server with Elixir at X-Plane StackExchange Performance Chruby Password Safe Vault Rails Custom Credentials Transcript: You can help edit this transcript on GitHub. [00:00:00] Jeremy: Today, I'm talking to Josef Strzibny.He's the author of the book deployment from scratch. A fedora contributor. And he previously worked on the developer experience team at red hat. Josef welcome to software engineering radio. [00:00:13] Josef: Uh, thanks for having me. I'm really happy to be here. There are a lot of commercial services for hosting applications these days. One that's been around for quite a while is Heroku, but there's also services like render and Netlify. why should a developer learn how to deploy from scratch and why would a developer choose to self host an application [00:00:37] Josef: I think that as a web engineers and backend engineers, we should know a little bit more how we run our own applications, that we write. but there is also a business case, right?For a lot of people, this could be, uh, saving money on hosting, especially with managed databases that can go, high in price very quickly. and for people like me, that apart from daily job have also some side project, some little project they want to, start and maybe turn into a successful startup, you know but it's at the beginning, so they don't want to spend too much money on it, you know?And, I can deploy and, serve my little projects from $5 virtual private servers in the cloud. So I think that's another reason to look into it. And business wise, if you are, let's say a bigger team and you have the money, of course you can afford all these services. But then what happened to me when I was leading a startup, we were at somewhere (?) and people are coming and asking us, we need to self host their application.We don't trust the cloud. And then if you want to prepare this environment for them to host your application, then you also need to know how to do it. Right? I understand completely get the point of not knowing it because already backend development can be huge.You know, you can learn so many different databases, languages, whatever, and learning also operations and servers. It can be overwhelming. I want to say you don't have to do it all at once. Just, you know, learn a little bit, uh, and you can improve as you go. Uh, you will not learn everything in a day. [00:02:28] Jeremy: So it sounds like the very first reason might be to just have a better understanding of, of how your applications are, are running. Because even if you are using a service, ultimately that is going to be running on a bare machine somewhere or run on a virtual machine somewhere. So it could be helpful maybe for just troubleshooting or a better understanding how your application works.And then there's what you were talking about with some companies want to self-host and, just the cost aspect. [00:03:03] Josef: Yeah. for me, really, the primary reason would be to understand it because, you know, when I was starting programming, oh, well, first of there was PHP and I, I used some shared hosting thing, just some SFTP. Right. And they would host it for me. It was fine. Then I switched to Ruby on Rails and at the time, uh, people were struggling with deploying it and I was asking myself, so, okay, so you ran rails s like for a server, right. It starts in development, but can you just do that on the server for, for your production? You know, can you just rails server and is that it, or is there more to it? Or when people were talking about, uh, Linux hardening, I was like, okay, but you know, your Linx distribution have some good defaults, right.[00:03:52] Jeremy: So why do you need some further hardening? What does it mean? What to change. So for me, I really wanted to know, uh, the reason I wrote this book is that I wanted to like double down on my understanding that I got it right. Yeah, I can definitely relate in the sense that I've also used Ruby and Ruby on rails as well. And there's this, this huge gap between just learning how to run it in a development environment on your computer versus deploying it onto a server and it's pretty overwhelming. So I think it's, it's really great that, that you're putting together a book that, that really goes into a lot of these things that I think that usually aren't talked about when people are just talking about learning a language. [00:04:39] Josef: you can imagine that a lot of components you can have into this applications, right? You have one database, maybe you have more databases. Maybe you have a redis key-value store. Uh, then you might have load balancers and all that jazz. And I just want to say that there's one thing I also say in the book, like try to keep it simple. If you can just deploy one server, if you don't need to fulfill some SLE (SLA) uh, uptime, just do the simplest thing first, because you will really understand it. And when there was an error you will know how to fix it because when you make things complex for you, then it will be kind of lost, very quickly. So I try to really make things as simple as possible to stay on top of them.[00:05:25] Jeremy: I think one of the first decisions you have to make, when you're going to self host an application is you have to decide which distribution you're going to use. And there's things like red hat and Ubuntu, and Debian and all these different distributions. And I'm wondering for somebody who just wants to deploy their application, whether that's rails, Django, or anything else, what are the key differences between them and, and how should they choose a distribution?[00:05:55] Josef: if you already know one particular distribution, there's no need to constantly be on the hunt for a more shiny thing, you know, uh, it's more important that you know it well and, uh, you are not lost. Uh, that said there are differences, you know, and there could be a long list from goals and philosophy to who makes it whether community or company, if it's showing distribution or not, lack of support, especially for security updates, uh, the kind of init systems, uh, that is used, the kind of c library that is used packaging format, package manager, and for what I think most people will care about number of packages and the quality or version, right?Because essentially the distribution is distribution of software. So you care about the software. If you are putting your own stuff, on top of it. you maybe don't care. You just care about it being a Linux distribution and that's it. That's fine. But if you are using more things from the distribution, you might star, start caring a little bit more.You know, other thing is maybe a support for some mandatory access control or in the, you know, world of Docker, maybe the most minimal image you can get established because you will be building a lot of, a lot of times the, the Docker image from the Docker file. And I would say that two main family of systems that people probably know, uh, ones based on Fedora and those based on Debian, right from Fedora, you have, uh, Red Hat Enterprise Linux, CentOS, uh, Rocky Linux.And on the Debian side you have Ubuntu which is maybe the most popular cloud distribution right now. And, uh, of course as a Fedora packager I'm kind of, uh, in the fedora world. Right. But if I can, if I can mention two things that I think makes sense or like our advantage to fedora based systems. And I would say one is modular packages because it's traditional systems for a long time or for only one version of particular component like let's say postgresql, uh, or Ruby, uh, for one big version.So that means, uh, either it worked for you or it didn't, you know, with databases, maybe you could make it work. With ruby and python versions. usually you start looking at some version manager to compile their own version because the version was old or simply not the same, the one your application uses and with modular packages, this changed and now in fedora and RHEL and all this, We now have several options to install. There are like four different versions of postgresql for instance, you know, four different versions of redis, but also different versions of Ruby, python, of course still, you don't get all of the versions you want. So for some people, it still might not work, but I think it's a big step forward because even when I was working at Red Hat, we were working on a product called software collections.This was kind of trying to solve this thing for enterprise customers, but I don't think it was particularly a good solution. So I'm quite happy about this modularity effort, you know, and I think the modular packages, I look into them recently are, are very better, but I will say one thing don't expect to use them in a way you use your regular version manager for development.So, if you want to be switching between versions of different projects, that's not the use case for them, at least as I understand it, not for now, you know, but for server that's fine. And the second, second good advantage of Fedora based system, I think is good initial SELinux profile settings, you know, SE Linux is security enhanced Linux.What it really is, is a mandatory access control. So, on usual distribution, you have a discrete permissions that you set that user set themselves on their directories and files, you know, but this mandatory access control means that it's kind of a profile that is there beforehand, the administrators prepares. And, it's kind of orthogonal to those other security, uh, boundaries you have there. So that will help you to protect your most vulnerable, uh, processes because especially with SELinux, there are several modes. So there is, uh, MLS (?) mode for like that maybe an army would use, you know, but for what we use, what's like the default, uh, it's uh, something called targeted policy.And that means you are targeting the vulnerable processes. So that means your services that we are exposing to external world, like whether it's SSH, postgresql, nginx, all those things. So you have a special profile for them. And if someone, some, attacker takes over, of your one component, one process, they still cannot do much more than what the component was, uh, kind of prepared to do.I think it's really good that you have this high-quality settings already made because other distributions, they might actually be able to run with SELinux. But they don't necessarily provide you any starting points. You will have to do all your policies yourself. And SELinux is actually a quite complex system, you know, it's difficult.It's even difficult to use it as a user. Kind of, if you see some tutorials for CentOS, uh, you will see a lot of people mentioned SELinux maybe even turning it off, there's this struggle, you know, and that's why I also, use and write like one big chapter on SELinux to get people more familiar and less scared about using it and running with it.[00:12:00] Jeremy: So SELinux is, it sounds like it's basically something where you have these different profiles for different types of applications. You mentioned SSH, for example, um, maybe there could be one for nginx or, or one for Postgres. And they're basically these collections of permissions that a process should be able to have access to whether that's, network ports or, file system permissions, things like that.And they're, they're kind of all pre-packaged for you. So you're saying that if you are using a fedora based distribution, you could, you could say that, I want SSH to be allowed. So I'm going to turn on this profile, or I want nginx to be used on this system. So I'm going to turn on this profile and those permissions are just going to be applied to the process that that needs it is that is that correct?[00:12:54] Josef: Well, actually in the base system, there will be already a set of base settings that are loaded, you know, and you can make your own, uh, policy models that you can load. but essentially it works in a way that, uh, what's not really permitted and allowed is disallowed.that's why it can be a pain in the ass. And as you said, you are completely correct. You can imagine it as um nginx as a reverse proxy, communicating with Puma application server via Unix socket, right? And now nginx will need to have access to that socket to be even being able to write to a Unix socket and so on.So things like that. Uh, but luckily you don't have to know all these things, because it's really difficult, especially if you're starting up. Uh, so there are set of tools and utilities that will help you to use SELinux in a very convenient way. So what you, what you do, what I will suggest you to do is to run SELinux in a permissive mode, which means that, uh, it logs any kind of violations that application does against your base system policies, right?So you will have them in the log, but everything will work. Your application will work. So we don't have to worry about it. And after some time running your application, you've ran these utilities to analyze these logs and these violations, and they can even generate a profile for you. So you will know, okay, this is the profile I need.This is the access to things I need to add. once after you do that, if, if there will be some problems with your process, if, if some article will try to do something else, they will be denied.That action is simply not happening. Yeah. But because of the utilities, you can kind of almost automate how, how you make a profile and that way is much, much easier.Yeah. [00:14:54] Jeremy: So, basically the, the operating system, it comes with all these defaults of things that you're allowed to do and not allowed to do, you turn on this permissive flag and it logs all the things that it would have blocked if you, were enforcing SELinux. And then you can basically go in and add the things that are, that are missing.[00:15:14] Josef: Yes exactly right. [00:15:16] Jeremy: the, next thing I'd like to go into is, one of the things you talk about in the book is about how your services, your, your application, how it runs, uh, as, as daemons. And I wonder if you could define what a daemon is?[00:15:33] Josef: Uh, you can think about them as a, as a background process, you know, something that continuously runs In the background. Even if the virtual machine goes down and you reboot, you just want them again to be restarted and just run at all times the system is running.[00:15:52] Jeremy: And for things like an application you write or for a database, should the application itself know how to run itself in the background or is that the responsibility of some operating system level process manager?[00:16:08] Josef: uh, every Linux operating system has actually, uh, so-called init system, it's actually the second process after the Linux kernel that started on their system, it has a process ID of one. And it's essentially the parent of all your processes because on Linux, you have always parents and children. Because you use forking to make new, make new processes. And so this is your system process manager, but obviously systemd if it's your system process manager, you already trusted with all the systems services, you can also trust them with your application, right? I mean, who else would you trust even if you choose some other purchase manager, because there are many, essentially you would have to wrap up that process manager being a systemd service, because otherwise there is, you wouldn't have this connection of systemd being a supreme supervisor of your application, right?When, uh, one of your services struggle, uh, you want it to be restarted and continue. So that's what a systemd could do for you. If you, you kind of design everything as a systemd service, for base packages like base postgresql they've already come with a systemd services, very easy to use. You just simply start it and it's running, you know, and then for your application, uh, you would write a systemd service, which is a little file.There are some directives it's kind of a very simple and straightforward, uh, because before, before systemd people were using the services with bash and it was kind of error prone, but now with systemd it's quite simple. They're just a set of directives, uh, that you learn. you tell systemd, you know, under what user you should run, uh, what working directory you want it to be running with.Uh, is there a environment file? Is there a pidfile? And then, uh, A few other things. The most important being a directive called ExecStart, which tells systemd what process to start, it will start a process and it will simply oversee oversee it and will look at errors and so on. [00:18:32] Jeremy: So in the past, I know there used to be applications that were written where the application itself would background itself. And basically that would allow you to run it in the background without something like a systemd. And so it sounds like now, what you should do instead is have your application be built to just run in the foreground.and your process manager, like systemd can be configured to, um, handle restarting it, which user is running it. environment variables, all sorts of different things that in the past, you might've had to write in your own bash script or write into the application itself.[00:19:14] Josef: And there's also some. other niceties about systemd because for example, you can, you can define how reloading should work. So for instance, you've just changed some configuration and you've want to achieve some kind of zero downtime, ah, change, zero downtime deploy, you know, uh, you can tell systemd how this could be achieved with your process and if it cannot be achieved, uh, because for instance, uh, Puma application server.It can fork processes, and it can actually, it can restart those processes in a way that it will be zero downtime, but when you want to change to evolve (?) Puma process. So what do you do, right? And uh systemd have this nice uh thing called, uh, socket activation. And with system socket activation, you can make another unit.Uh, it's not a service unit. It's a socket unit there are many kinds of units in systemd. And, uh, you will basically make a socket unit that would listen to those connections and then pass them to the application. So while application is just starting and then it could be a completely normal restart, which means stopping, starting, uh, then it will keep the connections open, keep the sockets open and then pass them. when the application is ready to, to process them.[00:20:42] Jeremy: So it sounds like if, and the socket you're referring to these would be TCP sockets, for example, of someone trying to access a website.[00:20:53] Josef: Yes, but actually worked with Unix. Uh, socket as well. Okay. [00:20:58] Jeremy: so in, in that example, Um, let's say a user is trying to go to a website and your service is currently down. You can actually configure systemd to, let the user connect and, and wait for another application to come back up and then hand that connection off to the application once it's, once it's back up.[00:21:20] Josef: yes, exactly. That, yeah. [00:21:23] Jeremy: you're basically able to remove some of the complexity out of the applications themselves for some of these special cases and, and offload those to, to systemd.[00:21:34] Josef: because yeah, otherwise you would actually need a second server, right? Uh, you will have to, uh, start second server, move traffic there and upgrade or update your first server. And exchange them back and with systemd socket activation you can avoid doing that and still have this final effect of zero downtime deployment. [00:21:58] Jeremy: So the, this, this introduction of systemd as the process manager, I think there's, this happened a few years ago where a lot of Linux distributions moved to using systemd and there, there was some, I suppose, controversy around that. And I'm kind of wondering, um, if you have any perspective on, on why there's some people who, really didn't want that to happen, know, why, why that's something people should worry about or, or, or not.[00:22:30] Josef: Yeah. there were, I think there were few things, One one was for instance, the system logging that suddenly became a binary format and you need a special utility to, to read it. You know, I mean, it's more efficient, it's in a way better, but it's not plain text rich, all administrators prefer or are used to. So I understand the concern, you know, but it's kind of like, it's fine.You know, at least to me, it it's fine. And the second, the second thing that people consistently force some kind of system creep because uh systemd is trying to do more and more every year. So, some people say it's not the Unix way, uh systemd should be very minimal in its system and not do anything else.It's it's partially true, but at the same time, the things that systemd went into, you know, I think they are essentially easier and nice to use. And this is the system, the services I can say. I certainly prefer how it's done now, [00:23:39] Jeremy: Yeah. So it sounds like we've been talking about systemd as being this process manager, when the operating system first boots systemd starts, and then it's responsible for starting, your applications or other applications running on the same machine. Uh, but then it's also doing all sorts of other things.Like you talked about that, that socket activation use case, there's logging. I think there's also, scheduled jobs. There's like all sorts of other things that are part of systemd and that's where some people, disagree on whether it should be one application that's handling all these things.[00:24:20] Josef: Yeah. Yeah. Uh, you're right with the scheduling job, like replacing Cron, you have, now two ways how to do it. But, you can still pretty much choose, what you use, I mean, I still use Cron, so I don't see a trouble there. we'll see. We'll see how it goes. [00:24:40] Jeremy: One of the things I remember I struggled with a little bit when I was learning to deploy applications is when you're working locally on your development machine, um, you have to install a language runtime in a lot of cases, whether that's for Ruby or Python, uh, Java, anything like that. And when someone is installing on their own machine, they often use something like a, a version manager, like for example, for Ruby there's rbenv and, for node, for example, there's, there's NVM, there's all sorts of, ways of installing language, run times and managing the versions.How should someone set up their language runtime on a server? Like, would they use the same tools they use on their development machine or is it something different.[00:25:32] Josef: Yeah. So there are several ways you can do, as I mentioned before, with the modular packages, if you find the version there. I would actually recommend try to do it with the model package because, uh, the thing is it's so easy to install, you know, and it's kind of instant. it takes no time on your server.It's you just install it. It's a regular package. same is true when building a Docker, uh, docker image, because again, it will be really fast. So if you can use it, I would just use that because it's like kind of convenient, but a lot of people will use some kind of version manager, you know, technically speaking, they can only use the installer part.Like for instance, chruby with ruby-install to install new versions. Right. but then you would have to reference these full paths to your Ruby and very tedious. So what I personally do, uh, I just really set it up as if I am on a developer workstation, because for me, the mental model of that is very simple.I use the same thing, you know, and this is true. For instance, when then you are referencing what to start in this ExecStart directive and systedD you know, because you have several choices. For instance, if you need to start Puma, you could be, you could be referencing the address that is like in your user home, .gem, Ruby version number bin Puma, you know, or you can use this version manager, they might have something like chruby-exec, uh, to run with their I (?) version of Ruby, and then you pass it, the actual Puma Puma part, and it will start for you, but what you can also do.And I think it's kind of beautiful. You can do it is that you can just start bash, uh, with a login shell and then you just give it the bundle exec Puma command that you would use normally after logging. Because if you install it, everything, normally, you know, you have something.you know, bashprofile that will load that environment that will put the right version of Ruby and suddenly it works.And I find it very nice. Because even when you are later logging in to your, your, uh, box, you log in as that user as that application user, and suddenly you have all the environment, then it just can start things as you are used to, you know, no problem there. [00:28:02] Jeremy: yeah, something I've run in into the past is when I would install a language runtime and like you were kind of describing, I would have to type in the, the full path to, to get to the Ruby runtime or the Python runtime. And it sounds like what you're saying is, Just install it like you would on your development machine.And then in the systemd configuration file, you actually log into a bash shell and, and run your application from the bash shell. So it has access to the, all the same things you would have in an interactive, login environment. Is that, is that right?[00:28:40] Josef: yeah, yeah. That's exactly right. So it will be basically the same thing. And it's kind of easy to reason about it, you know, like you can start with that might be able to change it later to something else, but, it's a nice way of how to do it. [00:28:54] Jeremy: So you mentioned having a user to run your application. And so I'm wondering how you decide what Linux users should run your applications. Are you creating a separate user for each application you run? Like, how are you making those decisions?[00:29:16] Josef: yes, I am actually making a new user for, for my application. Well, at least for the part of the application, that is the application server and workers, you know, so nginx um, might have own user, postgresql might have his own user, you know, I'm not like trying to consolidate that into one user, but, uh, in terms of rails application, like whatever I run Puma or whenever I run uh sidekiq, that will be part of the one user, you know, application user.Uh, and I will appropriately set the right access to the directories. Uh, so it's isolated from everything else, [00:30:00] Jeremy: Something that I've seen also when you are installing Ruby or you're installing some other language runtime, you have. The libraries, like in the case of Ruby there's there's gems. and when you're on your development machine and you install these, these gems, these packages, they, they go into the user's home directory.And so you're able to install and use them without having let's say, um, sudo or root access. is that something that you carry over to your, your deployments as well, or, or do you store your, your libraries and your gems in some place that's accessible outside of that user? I'm just wondering how you approach it.[00:30:49] Josef: I would actually keep it next to next to my application, this kind of touches maybe the question or where to put your application files on the system. so, uh, there is something called FHS, file system hierarchy standard, you know, that, uh, Linux distributions use, they, of course use it with some little modifications here and there.And, uh, this standard is basically followed by packagers and enforced in package repositories. Uh, but other than that, it's kind of random, you know, it could be a different path and, uh, it says where certain files should go. So you have /home we have /usr/bin for executables. /var for logs and so on and so on.And now when you want to put your, your application file somewhere, you are thinking about to put them, right. Uh, you have essentially, I think like three options, for, for one, you can put it to home because it's, as we talked about, I set up a dedicated user for that application. So it could make sense to put it in home.Why I don't like putting it at home is because there are certain labeling in SELinux that kind of, makes your life more difficult. it's not meant to be there, uh, essentially on some other system. Uh, without SELinux, I think it works quite fine. I also did before, you know, it's not like you cannot do it.You can, uh, then you have, the, kind of your web server default location. You know, like /usr/share/nginx/html, or /var/www, and these systems will be prepared for you with all these SELinux labeling. So when you put files there, uh, things will mostly work, but, and I also saw a lot of people do that because this particular reason, what I don't like about it is that if nginx is just my reverse proxy, you know, uh, it's not that I am serving the files from there.So I don't like the location for this reason. If it will be just static website, absolutely put it there that's the best location. then you can put it to some arbitrary location, some new one, that's not conflicting with anything else. You know, if you want to follow the a file system hierarchy standard, you put it to /srv, you know, and then maybe slash the name of the application or your domain name, hostname you can choose, what you like.Uh, so that's what I do now. I simply do it from scratch to this location. And, uh, as part of the SELinux, I simply make a model, make a, make a profile, uh, an hour, all this paths to work. And So to answer your question where I would put this, uh, gems would actually go to this, to this directory, it will be like /apps/gems, for instance.there's a few different places people could put their application, they could put it in the user's home folder, but you were saying because of the built-in SELinux rules SELinux is going to basically fight you on that and prevent you from doing a lot of things in that folder.[00:34:22] Jeremy: what you've chosen to do is to, to create your own folder, that I guess you described it as being somewhat arbitrary, just being a folder that you consistently are going to use in all your projects. And then you're going to configure SELinux to allow you to run, uh, whatever you want to run from this, this custom folder that you've decided.[00:34:44] Josef: Yeah, you can say that you do almost the same amount of work for home or some other location I simply find it cleaner to do it this way and in a way. I even fulfilled the FHS, uh, suggestion, to put it to /srv but, uh, yeah, it's completely arbitrary. You can choose anything else. Uh, sysadmins choose www or whatever they like, and it's fine.It'll work. There's there's no problem. There. And, uh, and for the gems, actually, they could be in home, you know, but I just instruct bundler to put it to that location next to my application. [00:35:27] Jeremy: Okay. Rather than, than having a common folder for multiple applications to pull your libraries or your gems from, uh, you have it installed in the same place as the application. And that just keeps all your dependencies in the same place.[00:35:44] Josef: Yep, [00:35:45] Jeremy: and the example you're giving, you're, you're putting everything in /srv/ and then maybe the name of your application. Is that right? [00:35:55] Josef: Yeah. [00:35:55] Jeremy: Ok. Yeah. Cause I've, I've noticed that, Just looking at different systems. I've seen people install things into /opt. installed into /srv and it can just be kind of, tricky as, as somebody who's starting out to know, where am I supposed to put this stuff?So, so basically it sounds like just, just pick a place and, um, at least if it's in slash srv then sysadmins who are familiar with, the, the standard file system hierarchy will will know to, to look at.[00:36:27] Josef: yeah. Yeah. opt is also a yeah, common location, as you say, or, you know, if it's actually a packaged web application fedora it can even be in /usr/share, you know? So, uh, it might not be necessarily in locations we talked about before one of the things you cover in the book is. Setting up a deployment system and you're using, shell scripts in the case of the book. And I was wondering how you decide when shell scripts are sufficient and when you should consider more specialized tools like Ansible or chef puppet, things like.[00:37:07] Josef: yeah, I chose bash in the book because you get to see things without obstructions. You know, if I would be using, let's say Ansible and suddenly we are writing some YAML files and, uh, you are using a lot of, lot of Python modules to Ansible use and you don't really know what's going on at all times. So you learn to do things with ansible 2.0, let's say, and then new ansible comes out and you have to rely on what you did, you know, and I've got to rewrite the book. Uh, but the thing is that, with just Bash I can show, literally just bash commands, like, okay, you run this and this happens, And, another thing uh why I use it is that you realize how simple something can be.Like, you can have a typical cluster with sssh, uh, and whatever in maybe 20 bash commands around that, so it's not necessarily that difficult and, uh, it's much easier to actually understand it if it's just those 20, uh, 20 bash comments. Uh, I also think that learning a little bit more about bash is actually quite beneficial because you encounter them in various places.I mean, RPM spec files, like the packages are built. That's bash, you know, language version managers, uh, like pyenv rbenv that's bash. If you want to tweak it, if you have a bug there, you might look into source code and try to fix it. You know, it will be bash. Then Docker files are essentially bash, you know, their entry points scripts might be bash.So it's not like you can avoid bash. So maybe learning a little bit. Just a little bit more than, you know, and be a little bit more comfortable. I think it can get you a long way because even I am not some bash programmer, you know, I would never call myself like that. also consider this like, uh, you can have full featured rails application, maybe in 200 lines of bash code up and running somewhere.You can understand it in a afternoon, so for a small deployment, I think it's quite refreshing to use bash and some people miss out on not just doing the first simple thing possible that they can do, but obviously when you go like more team members, more complex applications or a suite of applications, things get difficult, very fast with bash.So obviously most people will end up with some higher level too. It can be Ansible. Uh, it can be chef, it might be Kubernetes, you know, so, uh, my philosophy, uh, again, it's just to keep it simple. If I can do something with bash and it's like. 100 lines, I will do this bash because when I come back to it in, after three years, it will work and I can directly see what I have to fix.You know, if there's a postgresql update at this new location whatever, I, I immediately know what to look and what to change. And, uh, with high-level tooling, you kind of have to stay on top of them, the new versions and, updates. So that's the best is very limited, but, uh, it's kind of refreshing for very small deployment you want to do for your side project. [00:40:29] Jeremy: Yeah. So it sounds like from a learning perspective, it's beneficial because you can see line by line and it's code you wrote and you know exactly what each thing does. Uh, but also it sounds like when you have a project that's relatively small, maybe there, there aren't a lot of different servers or, the deployment process isn't too complicated.You actually choose to, to start with bash and then only move to, um, something more complicated like Ansible or, or even Kubernetes. once your project has, has gotten to a certain size.[00:41:03] Josef: you, you would see it in the book. I even explain a multiple server deployment using bash uh, where you can actually keep your components like kind of separate. So like your database have its own life cycle has its own deploy script and your load balancer the same And even when you have application servers.Maybe you have more of them. So the nice thing is that when you first write your first script to provision one server configure one server, then you simply, uh, write another Uh, supervising script, that would call this single script just in the loop and you will change the server variable to change the IP address or something.And suddenly you can deploy tomorrow. Of course, it's very basic and it's, uh, you know, it doesn't have some, any kind of parallelization to it or whatever, but if you have like three application servers, you can do it and you understand it almost immediately. You know, if you are already a software engineer, there's almost nothing to understand and you can just start and keep going.[00:42:12] Jeremy: And when you're deploying to servers a lot of times, you're dealing with credentials, whether that's private keys, passwords or, keys to third-party APIs. And when you're working with this self hosted environment, working with bash scripts, I was wondering what you use to store your credentials and, and how those are managed.I use a desktop application called password safe, uh, that can save my passwords and whatever. and you can also put their SSH keys, uh, and so on.[00:42:49] Josef: And then I simply can do a backup of this keys and of this password to some other secure physical location. But basically I don't use any service, uh, online for that. I mean, there are services for that, especially for teams and in clouds, especially the, big clouds they might have their own services for that, but for me personally, again, I just, I just keep it as simple as I can. It's just on my, my computer, maybe my hard disk. And that's it. It's nowhere else. [00:43:23] Jeremy: So, so would this be a case of where on your local machine, for example, you might have a file that defines all the environment variables for each server. you don't check that into your source code repository, but when you run your bash scripts, maybe read from that file and, use that in deploying to the server?[00:43:44] Josef: Yeah, Uh, generally speaking. Yes, but I think with rails, uh, there's a nice, uh, nice option to use, their encrypted credentials. So basically then you can commit all these secrets together with your app and the only thing you need to keep to yourself, it's just like one variable. So it's much more easy to store it and keep it safe because it's just like one thing and everything else you keep inside your repository.I know for sure there are other programs that we have in the same way that can be used with different stacks that doesn't have this baked in, because rails have have it baked in. But if you are using Django, if you are using Elixir, whatever, uh, then they don't have it. But I know that there are some programs I don't remember the names right now, but, uh, they essentially allow you to do exactly the same thing to just commit it to source control, but in a secure way, because it's, encrypted.[00:44:47] Jeremy: Yeah, that's an interesting solution because you always hear about people checking in passwords and keys into their source code repository. And then, you know, it gets exposed online somehow, but, but in this case, like you said, it's, it's encrypted and, only your machine has the key. So, that actually allows you to, to use the source code, to store all that.[00:45:12] Josef: Yeah. I think for teams, you know, for more complex deployments, there are various skills, various tools from HashiCorp vault, you know, to some cloud provider's things, but, uh, you can really start And, keep it very, very simple.[00:45:27] Jeremy: For logging an application that you're, you're self hosting. There's a lot of different managed services that exist. Um, but I was wondering what you use in a self hosted environment and, whether your applications are logging to standard out, whether they're writing to files themselves, I was wondering how you typically approach that.[00:45:47] Josef: Yeah. So there are lots of logs you can have, right from system log, your web server log application log, database log, whatever. and you somehow need to stay on top of them because, uh, when you have one server, it's quite fine to just look in, in and look around. But when there are more servers involved, it's kind of a pain and uh so people will start to look in some centralized logging system.I think when you are more mature, you will look to things like Datadog, right. Or you will build something of your own on elastic stack. That's what we do on the project I'm working on right now. But I kind of think that there's some upfront costs uh, setting it all up, you know, and in terms of some looking at elastic stack we are essentially building your logging application.Even you can say, you know, there's a lot of work I also want to say that you don't look into your logs all that often, especially if you set up proper error and performance monitoring, which is what I do with my project is one of the first thing I do.So those are services like Rollbar and skylight, and there are some that you can self host so if people uh, want to self host them, they can. But I find it kind of easier to, even though I'm self hosting my application to just rely on this hosted solution, uh, like rollbar, skylight, appsignal, you know, and I have to say, especially I started to like appsignal recently because they kind of bundle everything together.When you have trouble with your self hosting, the last thing you want to find yourself in a situation when your self hosted logs and sources, error reporting also went down. It doesn't work, you know, so although I like self-hosting my, my application.[00:47:44] Josef: I kind of like to offload this responsibility to some hosted hosted providers.[00:47:50] Jeremy: Yeah. So I think that in and of itself is a interesting topic to cover because we've mostly been talking about self hosting, your applications, and you were just saying how logging might be something that's actually better to use a managed service. I was wondering if there's other. Services, for example, CDNs or, or other things where it actually makes more sense for you to let somebody else host it rather than your [00:48:20] Josef: I think that depends. Logging for me. It's obvious. and then I think a lot of, lots of developers kind of fear databases. So there are they rather have some kind of, one click database you know, replication and all that jazz back then so I think a lot of people would go for a managed database, although it may be one of those pricy services it's also likes one that actually gives you a peace of mind, you know? maybe I would just like point out that even though you get all these automatic backups and so on, maybe you should still try to make your own backup, just for sure. You know, even someone promised something, uh, your data is usually the most valuable thing you have in your application, so you should not lose it.And some people will go maybe for load balancer, because it's may be easy to start. Like let's say on DigitalOcean, you know, uh, you just click it and it's there. But if you've got opposite direction, if you, for instance, decide to, uh, self host your uh load balancer, it can also give you more, options what to do with that, right?Because, uh, you can configure it differently. You can even configure it to be a backup server. If all of your application servers go down. Which is maybe could be interesting use case, right? If you mess up and your application servers are not running because you are just messing with, with them. Suddenly it's okay. Because your load balancers just takes on traffic. Right. And you can do that if it's, if it's your load balancer, the ones hosted are sometimes limited. So I think it comes to also, even if the database is, you know, it's like maybe you use some kind of extension that is simply not available. That kind of makes you, uh, makes you self host something, but if they offer exactly what you want and it's really easy, you know, then maybe you just, you just do it.And that's why I think I kind of like deploying to uh, virtual machines, uh, in the cloud because you can mix and match all the services do what you want and, uh, you can always change the configurations to fit, to, uh, meet your, meet your needs. And I find that quite, quite nice.[00:50:39] Jeremy: One of the things you talk about near the end of your book is how you, you start with a single server. You have the database, the application, the web server, everything on the same machine. And I wonder if you could talk a little bit about how far you can, you can take that one server and why people should consider starting with that approach. Uh, I'm not sure. It depends a lot on your application. For instance, I write applications that are quite simple in nature. I don't have so many SQL calls in one page and so on.[00:51:13] Josef: But the applications I worked for before, sometimes they are quite heavy and, you know, even, with little traffic, they suddenly need a more beefy server, you know, so it's a lot about application, but there are certainly a lot of good examples out there. For instance. The team, uh, from X-Plane flight simulator simulator, they just deploy to one, one server, you know, the whole backend all those flying players because it's essentially simple and they even use elixir which is based on BEAM VM, which means it's great for concurrency for distributed systems is great for multiple servers, but it's still deployed to one because it's simple. And they use the second only when they do updates to the service and otherwise they can, they go back to one.ANother one would be maybe Pieter Levels (?) a maker that already has like a $1 million business. And it's, he has all of his projects on one server, you know, because it's enough, you know why you need to make it complicated. You can go and a very profitable service and you might not leave one server. It's not a problem. Another good example, I think is stackoverflow. They have, I think they have some page when they exactly show you what servers they are running. They have multiple servers, but the thing is they have only a few few servers, you know, so those are the examples that goes against maybe the chant of spinning up hundreds of servers, uh, in the cloud, which you can do.It's easy, easier when you have to do auto scaling, because you can just go little by little, you know, but, uh, I don't see the point of having more servers. To me. It means more work. If I can do it, if one, I do it. But I would mention one thing to pay attention to, when you are on one server, you don't want suddenly your background workers exhaust all the CPU so that your database cannot serve, uh, your queries anymore right? So for that, I recommend looking into control groups or cgroups on Linux. When you create a simple slice, which is where you define how much CPU power, and how much memory can be used for that service. And then you attach it to, to some processes, you know, and when we are talking about systemd services.They actually have this one directive, uh, where you specify your, uh, C group slice. And then when you have this worker server and maybe it even forks because it runs some utilities, right? For you to process images or what not, uh, then it will be all contained within that C group. So it will not influence the other services you have and you can say, okay, you know, I give worker service only 20% of my CPU power because I don't care if they make it fast or not.It's not important. Important is that, uh, every visitor still gets its page, you know, and it's, they are working, uh, waiting for some background processes so they will wait and your service is not going down.[00:54:34] Jeremy: yeah. So it sort of sounds like the difference between if you have a whole bunch of servers, then you have to, Have some way of managing all those servers, whether that's Kubernetes or something else. Whereas, um, an alternative to that is, is having one server or just a few servers, but going a little bit deeper into the capabilities of the operating system, like the C groups you were referring to, where you could, you could specify how much CPU, how much Ram and, and things, for each service on that same machine to use.So it's kind of. Changing it, I don't know if it's removing work, but it's, it's changing the type of work you do. [00:55:16] Josef: Yeah, you essentially maybe have to think about it more in a way of this case of splitting the memory or CPU power. Uh, but also it enables you to use, for instance, Unix sockets instead of TCP sockets and they are faster, you know, so in a way it can be also an advantage for you in some cases to actually keep it on one server.And of course you don't have a network trip so another saving. So to get there, that service will be faster as long as it's running and there's no problem, it will be faster. And for high availability. Yeah. It's a, it's obviously a problem. If you have just one server, but you also have to think about it in more complex way to be high available with all your component components from old balancers to databases, you suddenly have a lot of things.You know, to take care and that set up might be complex, might be fragile. And maybe you are better off with just one server that you can quickly spin up again. So for instance, there's any problem with your server, you get alert and you simply make a new one, you know, and if you can configure it within 20, 30 minutes, maybe it's not a problem.Maybe even you are still fulfilling your, uh, service level contract for uptime. So I think if I can go this way, I prefer it simply because it's, it's so much easy to, to think about it. Like that.[00:56:47] Jeremy: This might be a little difficult to, to answer, but when you, you look at the projects where you've self hosted them, versus the projects where you've gone all in on say AWS, and when you're trying to troubleshoot a problem, do you find that it's easier when you're troubleshooting things on a VM that you set up or do you find it easier to troubleshoot when you're working with something that's connecting a bunch of managed services? [00:57:20] Josef: Oh, absolutely. I find it much easier to debug anything I set on myself, uh, and especially with one server it's even easier, but simply the fact that you build it yourself means that you know how it works. And at any time you can go and fix your problem. You know, this is what I found a problem with services like digital ocean marketplace.I don't know how they call this self, uh, hosted apps that you can like one click and have your rails django app up, up and running. I actually used when I, uh, wasn't that skilled with Linux and all those things, I use a, another distribution called. A turnkey Linux. It's the same idea. You know, it's like that they pre prepare the profile for you, and then you can just easily run it as if it's a completely hosted thing like heroku, but actually it's your server and you have to pay attention, but I actually don't like it because.You didn't set it up. You don't know how it's set up. You don't know if it has some problems, some security issues. And especially the people that come for these services then end up running something and they don't know. I believe they don't know because when I was running it, I didn't know. Right. So they are not even know what they are running.So if you really don't want to care about it, I think it's completely fine. There's nothing wrong with that. But just go for that render or heroku. And make your life easier, you know,[00:58:55] Jeremy: Yeah, it sounds like the solutions where it's like a one-click install on your own infrastructure. you get the bad parts of, of both, like you get the bad parts of having this machine that you need to manage, but you didn't set it up. So you're not really sure how to manage it.you don't have that team at Amazon who, can fix something for you because ultimately it's still your machine. So That could have some issues there. [00:59:20] Josef: Yeah. Yeah, exactly. I will. I would recommend it or if you really decide to do it, at least really look inside, you know, try to understand it, try to learn it, then it's fine. But just to spin it up and hope for the best, uh, it's not the way to go [00:59:37] Jeremy: In, in the book, you, you cover a few different things that you use such as Ruby on rails and nginx, Redis, postgres. Um, I'm assuming that the things you would choose for applications you build in self hosts. You want them to have as little maintenance as possible because you're the one who's responsible for all of it.I'm wondering if there's any other, applications that you consider a part of your default stack that you can depend on. And, that the, the maintenance burden is, is low. [01:00:12] Josef: Yeah. So, uh, the exactly right. If I can, I would rather minimize the amount of, uh, dependencies I have. So for instance, I would think twice of using, let's say elastic search, even though I used it before. And it's great for what it can do. Uh, if I can avoid it, maybe I will try to avoid it. You know, you can have descent full text search with Postgres today.So as long as it would work, I would uh, personally avoid it. Uh, I think one relation, uh, database, and let's say redis is kind of necessary, you know, I I've worked a lot with elixir recently, so we don't use redis for instance. So it's kind of nice that you can limit, uh, limit the number of dependencies by just choosing a different stack.Although then you have to write your application in a little different way. So sometimes even, yeah. In, in such circumstances today, this could be useful. You know, I, I think, it's not difficult to, to run it, so I don't see, I don't see a problem there. I would just say that with the services, like, uh, elastic search, they might not come with a good authentication option.For instance, I think asked et cetera, offers it, but not in the free version. You know, so I would just like to say that if you are deploying a component like that, be aware of it, that you cannot just keep it completely open to the world, you know? Uh, and, uh, maybe if you don't want to pay for a version that has it, or maybe are using it at the best, it doesn't have it completely.You could maybe build out just a little bit tiny proxy. That would just do authentication and pass these records back and forth. This is what you could do, you know, but just not forget that, uh, you might run something unauthenticated.I was wondering if there is any other, applications or capabilities where you would typically hand off to a managed service rather than, than trying to deal with yourself. [01:02:28] Josef: Oh, sending emails, not because it's hard. Uh, it's actually surprisingly easy to start sending your own emails, but the problem is, uh, the deliverability part, right? Uh, you want your emails to be delivered and I think it's because of the amount of spam everybody's sending.It's very difficult to get into people's boxes. You know, you simply be flagged, you have some unknown address, uh, and it would just it would just not work. So actually building up some history of some IP address, it could take a while. It could be very annoying and you don't even know how to debug it. You, you cannot really write Google.Hey, you know, I'm, I'm just like this nice little server so just consider me. You cannot do that. Uh, so I think kind of a trouble. So I would say for email differently, there's another thing that just go with a hosted option. You might still configure, uh, your server to be sending up emails. That could be useful.For instance, if you want to do some little thing, like scanning your system, a system log and when you see some troublesome. Logging in all that should, it shouldn't happen or something. And maybe you just want an alert on email to be sent to you that something fishy is going on. And so you, you can still set up even your server, not just your main application and might have a nice library for that, you know, to send that email, but you will still need the so-called relay server. to just pass your email. You. Yeah, because building this trust in an email world, that's not something I would do. And I don't think as a, you know, independent in the maker developer, you can really have resources to do something like that. So will be a perfect, perfect example for that. Yeah.[01:04:22] Jeremy: yeah, I think that's probably a good place to start wrapping up, but is there anything we missed that you think we should have talked about? [01:04:31] Josef: we kind of covered it. Maybe, maybe we didn't talk much about containers, uh, that a lot of people nowadays, use. uh, maybe I would just like to point out one thing with containers is that you can, again, do just very minimal approach to adopting containers. You know, uh, you don't need to go full on containers at all.You can just run a little service, maybe your workers in a container. For example, if I want to run something, uh, as part of my application, the ops team, the developers that develop this one component already provide a Docker file. It's very easy way to start, right? Because you just deployed their image and you run it, that's it.And they don't have to learn what kind of different stack it is, is a Java, is it python, how I would turn it. So maybe you care for your own application, but when you have to just take something that's already made, and it has a Docker image, you just see the nice way to start. And one more thing I would like to mention is that you also don't really need, uh, using services like Docker hub.You know, most people would use it to host their artifacts that are built images, so they can quickly pull them off and start them on many, many servers and blah, blah. But if you have just one server like me, but you want to use containers. And I think it's to just, you know, push the container directly. Essentially, it's just an archive.And, uh, in that archive, there are few folders that represent the layers. That's the layers you build it. And the Docker file and that's it. You can just move it around like that, and you don't need any external services to run your content around this little service.[01:06:18] Jeremy: Yeah. I think that's a good point because a lot of times when you hear people talking about containers, uh, it's within the context of Kubernetes and you know, that's a whole other thing you have to learn. You have to learn not only, uh, how containers work, but you have to learn how to deploy Kubernetes, how to work with that.And, uh, I think it's, it's good to remind people that it is possible to, to just choose a few things, run them as containers. Uh, you don't need to. Like you said, even run, everything as containers. You can just try a few things. [01:06:55] Josef: Yeah, exactly.[01:06:57] Jeremy: Where can people, uh, check out the book and where can they follow you and see what you're up to.[01:07:04] Josef: uh, so they can just go to deploymentfromscratch.com. That's like the homepage for the book. And, uh, if they want to follow up, they can find me on twitter. Uh, that would be, uh, slash S T R Z I B N Y J like, uh, J and I try to put updates there, but also some news from, uh, Ruby, Elixir, Linux world. So they can follow along.[01:07:42] Jeremy: Yeah. I had a chance to, to read through the alpha version of the book and there there's a lot of, really good information in there. I think it's something that I wish I had had when I was first starting out, because there's so much that's not really talked about, like, when you go look online for how to learn Django or how to learn Ruby on Rails or things like that, they teach you how to build the application, how to run it on your, your laptop.but there's this, this very, large gap between. What you're doing on your laptop and what you need to do to get it running on a server. So I think anybody who's interested in learning more about how to deploy their own application or even how it's done in general. I think they'll find the book really valuable.[01:08:37] Josef: Okay. Yeah. Thank you. Thank you for saying that. Uh, makes me really happy. And as you say, that's the idea I really packed, like kind of everything. You need in that book. And I just use bash so, it's easier to follow and keep it without any abstractions. And then maybe you will learn some other tools and you will apply the concepts, but you can do whatever you want.[01:09:02] Jeremy: All right. Well, Josef thank you, so much for talking to me today.[01:09:05] Josef: Thank you, Jeremy.
We've been running Software Daily for 6 years. When we started it back in 2015, the goal was to create a software engineering podcast that was 60% as good as Software Engineering Radio, but aired with 5x the frequency. 5 shows per week, 50 weeks per year, 5 ads per show, means 1,250 advertising spots.
We've been running Software Daily for 6 years. When we started it back in 2015, the goal was to create a software engineering podcast that was 60% as good as Software Engineering Radio, but aired with 5x the frequency. 5 shows per week, 50 weeks per year, 5 ads per show, means 1,250 advertising spots. The post Our Current Strategy appeared first on Software Engineering Daily.
We've been running Software Daily for 6 years. When we started it back in 2015, the goal was to create a software engineering podcast that was 60% as good as Software Engineering Radio, but aired with 5x the frequency. 5 shows per week, 50 weeks per year, 5 ads per show, means 1,250 advertising spots. The post Our Current Strategy appeared first on Software Engineering Daily.
We've been running Software Daily for 6 years. When we started it back in 2015, the goal was to create a software engineering podcast that was 60% as good as Software Engineering Radio, but aired with 5x the frequency. 5 shows per week, 50 weeks per year, 5 ads per show, means 1,250 advertising spots. The post Our Current Strategy appeared first on Software Engineering Daily.
Michael Ashburne and Maxwell Huffman are QA Managers at Aspiritech. This episode originally aired on Software Engineering Radio.Related Links: Aspiritech Section 508 Test for Accessibility ANDI Accessibility Testing Tool Windows Hardware Compatibility Program Audio over Bluetooth TranscriptYou can help edit this transcript on GitHub.Jeremy: [00:00:00] Today I'm joined by Maxwell, Huffman and Michael Ashburn. They're both QA managers at Aspiritech. I'm going to start with defining quality assurance. Could one of you start by explaining what it is?Maxwell: [00:00:15] So when I first joined Aspiritech, I was kind of curious about that as well. One of the main things that we do at Aspiritech besides quality assurance is we also, give meaningful employment to individuals on the autism spectrum. I myself am on the autism spectrum and that's what, initially attracted me to the company.quality assurance in a nutshell is making sure that, products and software is not defective. That it functions the way it was intended to function. Jeremy: [00:00:47] how would somebody know when they've, when they've met that goal? Michael: [00:00:50] It all depends on the client's objectives. I guess. quality assurance testing is always about trying to mitigate risk. There's only so much testing that is realistic to do, you know, you could test forever and never release your product and that's not good for business. It's really about, you know, balancing, like how likely is it that the customer is going to encounter defect X, how much time and energy would be required to, to fix it?Overall company reputation, impact, there's all sorts of different metrics. Uh, and every, every customer is unique really they, they get to set the pace, Maxwell: [00:01:30] does the product work well? is the user experience frustrating or not? that's always a bar that I look for. One of the main things that we review in the different defects that we find is customer impact.and how much of this is going to frustrate the customers. And when we're going through that analysis, is this cost effective or not. The client they'll determine it's worth, the cost of the, uh, quality assurance and of the fix of the software to make sure that that customer experience is smooth.Jeremy: [00:02:03] When you talk to, to software developers, now, a lot of them are familiar with things like they need to test their code right. They have things like unit tests and integration tests that they're running regularly. where does quality assurance fit in with that? Like, is that considered a part of quality assurance is quality assurance something different?Michael: [00:02:24] we try to partner with our clients, because the goal is the same, right. It's to release a quality product that's as free of defects, as, you know, as possible.We have multiple clients that will let us know these are clients typically that we've worked with for a long time that have sort of established a rhythm. they'll let us know when they've got a new product in the pipeline and as soon as they have available, Uh, software requirements, documentations specs, user guides, that kind of thing.They'll provide that to us, to be able to then plan. Okay. You know, what are these new features? Uh, what defects have been repaired since the last build or, you know, it all depends on what the actual product is. And we start preparing tests even before there may be, uh, A version of the software to test, you know, now that's more of a, what they call a waterfall approach where it's kind of a back and forth where, you know, the client preps the software, we test the software.If there's something amiss, the client makes changes. Then they give us a new build. but we just as well, we work in, uh, iterative design or agile is a popular term, of course, where. We have embedded testers, that are, you know, on a daily basis, interacting with, uh, client developers to address, you know, to, to verify certain parts of the code as it's being developed.Because of course the problem with waterfall is you find a defect and it, it could be deep in the code or some sort of linchpin aspect of the code. And then there's a lot of work to be done. To try to fix that sort of thing. Whereas, you know, embedded testers can identify a defect or, or even just like a friction point as early as possible.And so then they don't have to, you know, tear it all down and start over and it's just, Oh, fix that, you know, while they're working on that part, basically.Jeremy: [00:04:18] so I think there's two things you touched on there. One is. the ability to bring in QA early into the process. And if I understand correctly, what you were sort of describing is. even if you don't have a complete product yet, if you just have an idea of what you want to build, You were saying you start to generate test cases and it almost feels like you would be a part of generating the requirements generating. Like, what are the things that you need to build into your software before uh, the team that's building it necessarily knows themselves did that I sort of get that.Maxwell: [00:04:55] I've been in projects that we've worked with the product from cradle to grave. a lot of them haven't gotten all the way to a grave yet, but, um, some of them, the amount of support that they're offering. It's reached that milestone in its life cycle, where they're no longer going to, um, address the defects in the same way.They want to know that they're there. They want to know what exists. But then now there are new products that are being created, right? So we are, um, engaged in embedded testing, which is, which is testing, certain facets of the code actively, and making sure that it's, doing what it needs to do.And we can make that quick patch on that code and put it out to market. And we're also doing that at earlier stages with, in, in earlier development where before it's an even, fully formed design concepts, we're offering suggestions, and recommending that, you know, this doesn't follow with the design strategy and the concept design.so that part of embedded testing or unit testing, can be involved at earlier stages as well. For sure.Michael: [00:06:08] Of course, those, you have to be very, careful you know, we wouldn't necessarily make blanket recommendations to a new client, a lot of the clients that we have, we have been with us for several years. And so. You know, you develop a rhythm, common vocabulary, you know, you know, which generally speaking, which, goals weigh more than other goals and things like that from, from client to client or even coder to coder.it's only once we've really developed that. shared language that, you know, we would say by the way, you know, such and such as missing from blankety blank, say great example with a bunch of non words in it, but I think you get the picture.Jeremy: [00:06:48] when you're first starting to work on a project, you don't know a whole lot about it, right? you're trying to, to understand how this product is supposed to work and, what does that process look like? Like what should a company be providing to you? What are the sorts of meetings or conversations you're having that, that sort of thing.Michael: [00:07:08] we'll have an initial meeting with a handful of people from both sides and just sort of talk about both what we can bring to the project and what their objectives are. and, and, you know, the, the thing that they want us to test, if you will. And, if we reach an agreement that we want to move forward, then the next step would be like a product demo, basically, we would come together and we would start to fold in, you know, leads and some other analysts, you know, people that were, might be a good match for the project say, and we always ask, our clients.And they're usually pretty accommodating. if we can record the meeting, you know, now everyone's meeting on Google meet and virtually and so forth. And so, uh, that makes it a little easier, but a lot of our analysts have everyone has their own learning style. Right. You know, some people are more auditory, some people are more visual.So we preserve, you know, the client's own demonstration of what it's either going to be like, or is like or is wrong or whatever they want us to know about it. and then we can add that file to our secure project folder and anybody down the road that's being onboarded. Like that's, that's a resource an asynchronous resource that they can turn to right? A person doesn't have to re demonstrate the software to onboard them, or sometimes, you know, by the time we're onboarding new people, the software has changed enough that we have to set those aside actually. And then you have to do a live in person kinda deal.Maxwell: [00:08:32] and you really want to consider, individuals on the, on the spectrum, the different analysts and testers they do have different learning styles. We do want to ask for as many different. resources that are available, to, accommodate for that, but also to have us be, the best enabled to be the subject matter experts on the product.so what we've found is that what we're really involved in is writing test cases and, and, and rewriting test cases to humanize the software to really get at, what are you asking this software to do that in turn is what the product is doing. a lot of the testing we do is black box testing, and we want to understand what the original design concept is.So that involves the user interface, design document, right? early stages of that, if available, or just that, dialogue that Michael was referring to, to get that common language of what do you want this product to do? What are you really asking this code to do? having recordings, or any sort of training material, is absolutely essential. To being the subject matter experts and then developing the kind of testing that's required for that.Michael: [00:09:48] And all sorts of different clients have different, different amounts of testing material, so to speak I mean, everything from, you know, a company that has their own internal, test tracking software and they just have to give us access to it. And the test cases are already there to, a piece of paper, like a physical piece of paper that they copied the checklist into Excel.And now, like, these are the things that we look at, but of course there's always a lot more to it than that, but that at least gives us a starting point to sort of to build off of and, you know, testing areas and sections and, you know, sort of thematically related features, things like that. And then we can, we develop our own tests, on their behalf, basically.Jeremy: [00:10:29] And when you're building out your own tests, what, what would be the, the level of detail there? Would it be a high level thing that you want to accomplish in the software and then like absolute step by step, click by click, Michael: [00:10:42] You know, I hate to make every answer conditional, right. But it sort of depends on the software itself and what the client's goals are. one of our clients, uh, is developing a new, screen-sharing app that's for developers both work on the same code at the same time, but they can take turns, typing, controlling the mouse, that sort of thing.and although this product has been on the market for awhile, we started out with one of those checklists and now have hundreds of test cases based on, both features that they've added, as well as weird things that we found like, Oh, make sure sometimes you have to write a test case, uh, that tests for the negative, like the, the absence of a problem, right?So you can make sure X connects to Y and the video doesn't drop or. If you can answer the connection, on, before the first ring is done and it successfully connects anyway, or, or, you know, any host of, of options. So our test cases, for that project, we have a lot of, uh, screen caps and stuff because a picture's worth a thousand words as the cliche goes.but we also try to describe, describe the features, not just, you know, present the picture with an arrow, like click here and see what happens. Because again, everyone has sort of different data processing styles and some would prefer to read step by step instructions rather than try to interpret, you know, some colors in a picture.And what does this even mean out of context?Maxwell: [00:12:08] and lots of times you'll end up potentially seeing test cases they seem like they could be very easily automated. Cause literally they're written all in code. and the client will occasionally ask us to do a test cycle scrub or they'll ask us, okay, well, what can be automated within this?Right. But one of the key things we really look at is, is to try to humanize that test case a little more away from that just basic automation, lots of times that, that. Literally involves asking, what are you trying to get out of this out of this test case? cause it's fallen so much into the, into the weeds that you no longer can really tell what you're really asking it to really do So lots of times we will, we will help them automate them. But also just give it the proper test environment. and the, and the the proper steps, you'd really be amazed. How many test cases just do not have the proper steps get an, an actual expected result. And if it's written wrong at that basic manual level, you're not adding value.so that's one thing that we, that we really have found it's added value to the clients and to their test cycles.Michael: [00:13:21] A lot of people ask about automation because it's a very sexy term right now. And it certainly has its place. Right. But uh you can't automate new feature testing. it has to be an aspect of the product that's mature. Not changing from build to build. And you also have to have test cases that are mature, that you know, every little virtual or otherwise, you know, T is crossed and I is dotted, or else you end up having to do manual testing anyway, because the computer just goes ohit didn't work. Because that's really all the, you know, the automated process can do is either it passes or it doesn't. And so then we have to come in and, and we have clients where we do plenty of that. Like, okay, they ran through the tests and these three failed figure out why, and then they go in and start digging around and, Oh, it turns out this is missing or this got moved in the latest update or something like that.Jeremy: [00:14:12] that's an interesting, perspective for testing in general, where it sounds like when a feature is new, when you're making a lot of changes to how the software works. that's when, manual testing can actually be really valuable because as a person, you have a sense of what you want and if things kind of move around or don't work exactly the way you expect them to, but you kind of know what the end goal is. you have an idea of like, yes, this worked or no, this didn't. and then once that's solidified then that's when you said it's easier to, to shift into automatic testing. for example, having, an application, spin up a browser and click through things or, or trigger things through code, things like that.Michael: [00:14:58] And you have to, you know, you have to get the timing just right. Cause the computer can only wait in so many increments and you know, if it, if it tries to click the next thing too soon and it hasn't finished loading, you know, then it's all over. but that's actually the, the, the discernment that you were sort of referring to the, the, using your judgment when executing a test.that's where we really, we really do our best work and we have some analysts that specialize in exploratory testing, which is where you're just sort of looking around systematically or otherwise. I personally have never been able to do that very well. uh, but that's critical because those, those exploratory tests are always where you turn up the weirdest combination of things.Oh, I happened to have this old pair of headphones on and when I switched from Bluetooth to. manual plug, you know, just disconnected the phones or the, you know, the conference call altogether you know, and who does that. Right. But, you know, there's all, all sorts of different kinds of combinations and, and, and who knows what the end user is going to bring. He's not going to necessarily buy all new gear, right. When he gets the new computer, the new software, whatever.Jeremy: [00:16:05] I feel like there's been a. Uh, kind of a trend in terms of testing at software companies, where they, they used to commonly have, in-house testing or in-house QA, it would be separated from development.And now you're seeing more and more of, people on the engineering staff, on the developing staff being responsible for testing their own software, whether that be through unit tests, integration tests, Or even just using the software themselves, where you're getting to the point where you have more and more people are engineers that maybe have some expertise or some knowledge in tests and less, so people who are specifically dedicated to test. and so I wonder from your perspective you know, a QA firm or just testers in general? Like what their role is in, in software development going forward.Maxwell: [00:16:55] having specialized individuals that are constantly testing it and analyzing the components and making sure that you're on track to make that end concept design come to life really is essential. And that's what you get with the quality assurance. It's like a whole other wing of your company that basically is making sure that everything you are, that you are doing with this, product and with this software, is within scope.and you can't be doing anything better as well. that's the other aspect of it, right? cause lots of times when we find a component and we found something, that we've broken or we've found a flaw in the design we look at, what that means.bigger picture um, with the overall product. and we try to figure out all right, well, does this part of the functionality... is it worth it to fix this part of the functionality? is it cost-effective right. So lots of times quality assurance.it comes right down to the, to the cost-effectiveness of the different, patches. and lots of times it's even the safety. of the, uh, product itself. it all depends on what exactly you're, you're designing, but I can give you an, an example of a, of a product that, that we were, that we were working with in the past, where we were able to get a component to overheat, obviously that is a critical defect that needs to be addressed and fixed.that's something that can be found. as you're just designing the product. But to have a specialized division, that's just focused on quality assurance. They're more liable, they're more inclined. And that is what their directive is, is to find those sorts of defects. And I'll tell you the defects that we found that overheated this, this product, it was definitely an exploratory, find it was actually caught.Off of a test case that was originally automated. so we definitely, we're engaged in every aspect or a lot of the aspects of the, of the engineering departments with this, uh, products. but in the end it was exploratory testing.It was out of scope of what they had automated then ended up finding us. That's where I really see quality assurance in this, in this field within software engineering, really gaining respect and gaining momentum in understanding that, Hey, these are, these are really intelligent, potentially software engineers themselves.That their key focus is to, is to testing our product and making sure that it's a design that, that is within the scope.Michael: [00:19:36] It's helpful to have a fresh set of eyes too you know, if a person's been working on a product for, you know, day in, day out for months on end, inevitably there will be aspects that become second nature. may allow them to effectively like skip steps in the course of testing, some end result when they're doing their own testing, but you bring in, you know, a group of a group of analysts who know testing, but don't know your product other than generally what it's supposed to do and you sort of have at it and you find all sorts of interesting things that way.Jeremy: [00:20:13] Yeah, I think you brought up two interesting points. one of them is the fact that. Nowadays, there is such a big focus on, automated testing as a part of a continuous integration process, right? Somebody will write code they'll check in their code, it'll build, and then automated tests will see that it's still working.But those tests that the developers wrote, they're never going to find things that there were never a test written for. Right. So, I think that whole exploratory, testing aspect is interesting. and then Maxwell also brought up a good point uh, it sounds like QA can also not just help find what defects or issues exists, but they can also help. grade how much of an issue those defects are so that, the developers they can prioritize. Okay. Which ones are really a big deal that we need to fix, uh, versus what are things that, yeah, it's, I guess it's a little broken, but it's not, not such a big deal.Maxwell: [00:21:14] in a broader sense, there are certain whole areas of design right now. Uh, Bluetooth is a really, uh, big area that we've been working in. I'm the QA manager for the Bose client at, Aspiritech and Bluetooth is really a big thing that, that is, that is involved in all of their different speakers.So obviously if we, if we find anything, anything wrong with, with a certain area, you know, we want them to consider what areas they might want to focus more manual testing and less automation on. Right. and we're always thinking about, feature specific in that sense, um, to help the clients out as well.and analysts that are on the spectrum, they really have. it's fascinating how, how they tend to be very particular about certain defects. and, and they can really find things that are very exploratory, but they don't miss. the, uh, forest for the trees in the sense that they still maintain, the larger concept design, funnily enough, where they can let you know, you know, is Bluetooth really the factor in this, that should be fixed here or is it, or is it something else? to, it leads to different, to interesting avenues for sure.Michael: [00:22:32] Yeah, Bluetooth is really, A bag of knots in a lot of ways, you know, the different versions, different hardware vendors, we work with zebra technologies and they make barcode printers and scanners and so forth.And you know, many of their printers are Bluetooth enabled. but you know, the question is, is it Bluetooth 4 to Bluetooth, 4 is it, backwards compatible.And, uh, a certain, uh, rather ubiquitous, computer operating system is notorious for having trouble with Bluetooth management, whether it's headphones or printers or whatever. and in that instance, because we want to, you know, we're not testing the computer OS, we're testing the driver for the printer. Right. So, part of the protocol we wound up having to build into the, into the test cases is like, okay, first go in deactivate, the computer's own, resident internal hardware, Bluetooth, then connect to this, you know, third party USB dongle, install the software, make sure it's communicating, then try to connect to your printer For a long time, an analyst would run into some kind of issue.And the first question is always, are you using the computer Bluetooth? Or is it a third-party Bluetooth and is discoverable on, is it a Bluetooth, low energy because you don't want to print using Bluetooth low energy because it'll take forever. Right? And then the customer thinks, Oh, this isn't working. It's broke. You know, not even knowing that there's. Multiple kinds of Bluetooth and yeah. It's, uh, it's hairy for sure.Jeremy: [00:24:00] Yeah. And then I guess, as a part of that, that process, you're finding out that there isn't necessarily a problem in the customer's software. but it's some external case so that, when you get a support ticket or a call, then, you know, like, okay, this is another thing we can ask them to check. Yeah.Maxwell: [00:24:18] Absolutely. And then that's something that we are, that we've been, you know, definitely leveraged for, to help out, to try to resolve customer issues that come in as well, and try to set up a testing environment that mimics that. And, and we've occasionally. integrated that to, to become part of our manual testing and some automated scenarios as well.so that, so those have been interesting scenarios having to buy different routers and what, and what have you. And once again, it gets back to the cost-effectiveness of it. You know, what is, what is the market impact? Yes. This particular AT&T router or what have you, um, might be having an issue. but you know, how many, how many users in the wind, the world are really running the software on this.Right. and that's something that, everyone needs to, you know, that every company should consider when they're, considering, uh, you know, a a patch, um, in the, in the software.Jeremy: [00:25:14] and something you, you also brought up is. As a, as a software developer, when there is a problem. One of the things that we always look for is we look for a reproducible case, right? Like, what are the steps, um, you need to take to have this bug or this problem occur. And it sounds like one of the roles might be.we get in a report from a customer saying like this part of the software doesn't work. Um, but I'm not sure when that happens or how to get it to happen. And so, as a QA, uh, analysts, one of your roles might be taking those reports and then building a repeatable, um, test case.Michael: [00:25:55] Absolutely. There's lots of times where clients have said we haven't been able to reproduce this, see if you can. And you know, we get back to them after some increment of time. And, sometimes we can, and sometimes we can't, you know, sometimes we have to buy special, uh, like headphones or some kind of, you know, try to reproduce the investment that the client was using.Uh, in case there was some magic sauce interaction going on there.Maxwell: [00:26:21] our analysts on the spectrum. they are so particular in writing up defects, all the little details. and that really is so important in quality assurance is documentation for the entire process. that's one area where I think quality assurance really helps development in general is making sure that everything is documented that it's all there on paper and the documentation is, is solid and really sound. Um, so for a lot of these defects, we've actually come in and I think up the standard a little bit where you can't have the defect written where, you know, the reproducibility is one out of one.And it turns out this was a special build that the developer was using that no one else in the company was even using. It's a waste of time to track this defect down. And that's based on the fact that it was a poorly written up report in the first place.so it can be fun to have to. Track down all the various equipment you need for it. And analysts are really well-suited for writing those up and, and, uh, and investigating these different defects that are errors that we, that we find sometimes they're, sometimes they're not actually defects. They're just errors in the system. Michael: [00:27:34] uh, tell them about like The Bose guide that Bose wound up using the guide that we had made internally.Maxwell: [00:27:40] Yeah. There have been, so many guides that we've ended up creating that have been like terminal access, shortcuts, uh, just different, different ways to, you know, access the, uh, system, from a tester perspective that have, that have absolutely helped, just documenting all these things that engineers end up using to test code. Right. But lots of times these shortcuts Aren't well documented anywhere. so what quality assurance and what the Aspiritech has done, is we come in and we really create excellent training guides for how to, how to check the code.Um, and, and what all the various commands are that have to be inputted and how that translates to what, the more obvious user experiences, which is, I think a lot of times what ends up being lost. it ends up all being code language and you don't really know what the user experiences, it's nice to, to be able to, To have found that, that the guides that we've created when we show them to the clients, because really we created them to make life easier for us and make the testing easier for us to make it more translatable. When you see all this different code that some of us are very well versed in. but other analysts might not be. As well-versed in the code or that aspect of it. Right. but once you humanize it and you, and you sort of say, okay, well, this is what you're asking the code to do. then you have that other perspective of, I actually can think of a better way that we could Potentially do this. so we've brought a lot of those guides to the clients and they've really been, they've really been blown away, at how well documented All of that was, um, all the way down to the level of the, uh, GUIDs of all the systems. We have very good inventory tracking, and even being able to test and run, internal, components of the system. and that's why I bring up the a G U I D S so a lot of the testing that we end up doing, or I wouldn't say a lot, but a portion of it is the sort of tests that installers would be running this sort of functionality that only installers of systems would be, would be running. So, it's still, it's still black box testing, but it's behind the scenes of what the normal user experiences. Right. It's sort of the installer experience for lack of a better word.And even having that well-documented and finding errors. In, in those processes have been quite beneficial. I, I remember one scenario in which there was an emergency update method that we had to test, right. And this is, this was a type of method where if someone had to run it, they would take it into the store and a technician would run it right. So basically we're, we're running software quality assurance on a technicians test for a system and the way a technician would update the system. And what we found is that what they were asking the technician to do was a flawed and complex series of steps. it did work. But only one out of 30 times, and only if you did everything in a very particular timing.And it just was not something that was user-friendly, for the, a tech technician. So it's the kind of thing that we ended up finding. And lots of times it requires the creation of a guide because they don't have guides for technicians, to end up to end up finding a defect like that.Michael: [00:31:21] and the poor technician, you know, he's dealing with hundreds of different devices, whatever it is, you know, whatever the field is, whether it's phones or speakers or printers or computers or whatever. And, you know, this guy is not working with the same, software day in and day out. The way we have to sometimes again, because the developer is sort of building the, the tool that will do the stuff, you know, we're, we're dealing with the stuff it's doing. And so in a lot of ways, uh, we can bring our own level of expertise to a product. Uh, we can surpass, you know, the developer even, it's not like a contest, right.but just in terms of, you know, how many times is a developer installing it for the first time, like Maxwell was saying, what, when we do out of box testing, we have to reset everything and install it fresh over and over and over and over again. And so, so we wind up being exposed to this particular, you know, series of steps that the end user might only see a couple of times, but you know, who wants their brand new shiny thing, especially if it costs hundreds of dollars, you know, you don't want to have a lot of friction points in the process of finally using it.You know, you just kind of want it to just work as effectively as possible.Jeremy: [00:32:45] if I understood correctly in, in Maxwell's example, that would be, you had a physical product, like let's say a pair of headphones or something like that, and you need to upgrade, the firmware or. Perform some kind of reset. And that's something that like you were saying, a technician would normally have to go through, but, as QA, you go in and do the same process and realize like, this process is really difficult and it's really easy to make a mistake or, um, just not do it properly at all.and then, so you can propose like either, you know, ways to improve those steps or just show the developers like, Hey, look, I have to do all these things just to, you know, just update my firmware. Um, you might want to consider like, making that a little easier on, on your customers. Yeah.Maxwell: [00:33:32] Absolutely. And the other nice thing about it, Jeremy is, you know, we don't look at it at a series of tests like that as lower level functionality. just because, um, you know, it's more for a technician to have run it. It's actually part of the update testing.So it's, so it's actually very intricate. as far as the design of the, of the product. We find a defect in how this system updates. It's usually going to be a critical defect. Um, we don't want the product to ever end up being a boat anchor or a doorstop. Right. So that's so that's, so that's what we're always trying to avoid.and in that scenario, it's one of those things where then we don't exactly close the book on it once we, once we figure out, okay, this, this was a difficult scenario. For the technician, we resolve it for the technician. And then we look at, bigger scope, how does this affect the update process in general?You know, does this affect the, uh, customers testing, that suite of test cases that we have for those update processes. you know, it can, it can extend to that as well. Uh, and, and then we look at it in terms of automation too. Uh, to see if there's any areas where we need to fix the automation tests. Michael: [00:34:46] it can be as simple as power loss during the update at exactly the wrong time. the system will recover if it happens within the first 50 seconds or the last 30 seconds, but there's this middle part where it's trying to reboot in the process of updating its own firmware. And if the power happens to go out, then. You're out of luckthat does not make for a good, reputation for the client. that, I mean, the first thing a customer that's unhappy about that kind of thing is going to do is tell everybody else about this horrible experience they had.Maxwell: [00:35:20] Right. And I can think of a great example, Michael, we had found a ad hoc defect. They had asked us to look in this particular area. There was a very rare customer complaints of update issues. but they could not find it with their automation. we had one analyst that amazingly enough, was able to pull the power.At the right exact time in the right exact sequence. And, and we reported the ticket and we were able to capture the logs for this incident. And they must have run this through 200,000 automated tests and they could not replicate what this human could do with his hands. Um, and it would have really amazed them after we had found it.cause they really had run it through that many automation tests, but it does, it does happen where you find those. Jeremy: [00:36:10] we we've been talking about, uh, in this case you were saying this was for Bose, which is a very, large company. And I think that. When the average developer thinks about quality assurance, they usually think about it in the context of, I have a big enterprise company. Um, I have a large staff, I have money to pay for a whole bunch of analysts, things like that.I want to go back to where Michael, you had mentioned how. one of your customers was for a, uh, a screen-sharing application. we had an interview with Spencer Dixon who's the CTO at Tuple. I believe that's the product you're referring to. So. I wonder if you could walk us through, like for somebody who has a business that's I want to say they're probably maybe four or five people, something like that.what's the process for them, bringing on a dedicated analysts or testers. given that you're coming in, you have no knowledge of their software. What's the process there like?Michael: [00:37:13] first of all, not to, not to kiss up, but the guys at Tuple are a really great bunch of guys. They're very easy to work with. we have like an hourly cap, per month, you know, to try to not exceed a certain number of hours. That agreement helps to manage their costs. They're very forthcoming. and they really have, folded us in to their development process. You know, they've given us access to their, uh, trouble ticket, uh, software.We use their internal instant messaging application, to double-check on, you know, expected results. And is this a new feature or is this something that's changed unintentionally? so when we first started working with them, there was really only one. person on the project. and this person was in essence, tasked with turning, the Excel checklist of features into suites of test cases.And, you know, you, you start with Make sure X happens when you click Y and then you make that the title of a test case. And, you know, once you get all the easy stuff done, then you go through the steps of making it happen. They offered us a number of very helpful sort of starting videos that they have on their website for how to use the software, by no means are they comprehensive. but it was enough to get us comfortable, you know, with the basic functionality and then you just wind up playing with the software a lot. they were very open to giving us the ramp up time that we needed in order to check all the different boxes, uh, both ones on their list. And then new ones that we found because, you know, there's, there's more than one. connection type, right? That can be just a voice call or there can be the screen sharing and you can show your local video from your computer camera, so you can see each other in a small box. And, you know, what order do you turn those things on?And, which one has to work before the next one can work? Or what if a person changes their preferences in the midst of a call and, you know, these are things that, fortunately Tuple's audience is a bunch of developers. So, uh, when their clients, their customers report a problem, uh, the report is extremely thorough because they know what they're talking about.And so the reproduction steps are pretty good, but we still, sometimes we'll run into a situation, that they've shared with us. It's like, we can't, we can't make this one happen. And I don't know. I mean, The getting back to the Bluetooth, like they've even had customers where, uh, I guess one headset used a different frequency. Uh, than another one, even though they were on the same Bluetooth version. And when he changed this customer, I shouldn't say he, the customer, whoever whomever, they are, uh, they changed from one headset to another and you know, the whole thing fell apart and it's like, how do you even, you know, cause you don't go to the store and look on the package and see, Oh, this particular, uh, you know, headphone uses 48 kilo Hertz for their, you know, At the outset. I didn't even know that that was a thing that could be a problem. Right. It just, you figured Bluetooth has its band of the telecom spectrum and, but, you know, anything's possible. So they gave us time to ramp up, you know, cause they knew that they didn't have any test cases and uh, over time now, there's a dedicated team of three people that are on the project regularly, but it can expand to as many as six, you know, because it's a sharing application, right?So you tend to need multiple computers involved, And yeah, we've really, we've really enjoyed a relationship with Tupelo and our, and our eagerly awaiting, uh, if there would be windows version, because there's so many times when we'll be working on another project even, and, you know, talking with the person and saying, Oh, I wish I could, you know, we could use Tuple cause then I could click the thing on your screen and you could see it happen instead of just, you know, um, they are working on a Linux version though.I don't think that's a trade secret. So that's, that's in the pipeline. We're excited about that. And these guys, they pay their bills in like two days. No customers do that. They're, they're really something.Jeremy: [00:41:14] I mean, I think that's a, a sort of a unique case because it is a screen sharing application, you have things like headsets and webcams and, you're making calls between machines. So, I guess if you're testing, you would have all these different laptops or workstations set up all talking to one another.So yeah, I can imagine why it would be really valuable to have, people or a team dedicated to that.Michael: [00:41:40] And external webcams. And you know, whether you're, you're like my Mac mini is a 2012, so it doesn't have the three band audio. port, right. It's got one for microphone and one for headphone. So that in itself is like, well, I wonder how many of their customers are really going to have an older machine, but, it wound up being an interesting challenge because then I had to, if I was doing a testing, I had to have a microphone sort of distinct from the headphones.And then that brings in a whole other nest of interactivity that you have to. Account for maybe the microphones USB based, you know, all sorts of craziness. Jeremy: [00:42:19] I'm wondering if you have projects where you not only have to use the client, but you also have to, to set up the server infrastructure so that you can run tests internally. I'm wondering if you do that with clients and if you do like, what's your process for learning? How do I set up a test environment? How do I make it behave like the real thing, things like that.Maxwell: [00:42:42] So the production and testing equipment is what the customers have right. It's basically to, to create that setup, we just need the equipment from them and the user guides and the less information, frankly, the better in those setups, because you want to mimic what the customer's scenario is, right? You don't want to mimic too pristine of a setup. and that's something that we're always careful about when we're doing that sort of setup. As far as more of the integration. and the, uh, sandbox testing bed, where you're testing a new build for regressions or what, or what have you, that's going to be going out. we'd be connected to a different server environment. Michael: [00:43:24] And with zebra technologies, they're their zebra designer, printer driver. Uh, they support windows 7 windows 8.1 windows 10 and windows server 2012, 2016, 2019. And in the case of the non server versions, both 32 bit and 64 bit, because apparently windows 10 32 bit is more common in Europe, I guess, than it is here.And even though, you know, windows 7 has been deprecated by Microsoft, they've still got a customer base, you know, still running, you know, don't fix what ain't broke. Right. So why would you update a machine if it's doing exactly what you want, you know, in your store or a business or whatever it is. And so we make a point of, of executing tests in all 10 environments.It, it can be tedious because windows 7 uh, 32 and 64 have their own quirks. So we always have to test those too, you know, windows 8 and windows 10. They're fairly similar, but you know, they keep updating windows 10 and so it keeps changing. and then when it's time, for their printer driver to go through the, uh, The windows logo testing, they call it that's like their, their hardware quality labs, hardware certification, uh, that Microsoft has, which in essence means when you run a software update on your computer, uh, if there's a new version of the driver, it'll download it from Microsoft servers.You don't have to go to the customer website and specifically seek it out. So we actually do, uh, certification testing for zebra, uh, with that driver in all of those same environments and then submit the final, package for Microsoft's approval. And that's, uh, that's actually been, uh, sort of a job of ours if you will, for several years now.And that it's not something you take lightly when you're dealing with Microsoft and actually this sort of circles back to the, the writing, the guides, Because, you know, there are instructions that come with the windows hardware lab kit, but it doesn't cover everything obviously. And we wound up creating our own internal zebra.Printer driver certification guide and it's over a hundred pages because we wanted to be sure to include every weird thing that happens, even if it's only sometimes, and be sure you set this before you do this, because in the wrong order, then it will fail. And he won't tell you why and all sorts of strange things.And we've of course, uh, when we were nearing completion on that guide. Our contact at zebra was actually wanted, wanted a copy. Um, cause you know, we're not they're only a QA vendor obviously. and so if there's anything that would help and they have other divisions too, you know, they do, uh, uh, they have a browser print application that allows you to print directly to the printer from a web browser without installing a driver and that's a whole separate division and, you know, but overall, all these divisions, you know, have the same end goal as we do, which is, you know, sort of reducing the friction for the customer, using the product.Jeremy: [00:46:31] That's an example of a case where it, it sounds, you said it's like a hundred pages, so you've got these, these test cases basically ballooning in size. And maybe more specifically towards, the average software project, as development continues, new features get added, the product becomes more complex.I would think that the number of tests would grow, but I would also think that it, it can't grow indefinitely. Right. There has to be a point where. it's just not worth going through, you know, X number of tests versus the value you're going to get. So I wonder how you, how you manage that as things get more complicated, how do you choose what to drop and what to continue on with?Michael: [00:47:15] it obviously depends on the client, in the case of zebra to use them again, You know when we first started working with them, they put together the test suites. We just executed the test cases, as time went by, They began letting us put the test suites together. Cause you know, we've been working with the same test cases and you know, trying to come up with a system.So we sort of spread out the use instead of it always being the same number of test cases, because what happens when you get, when you execute the same tests over and over again, and they don't fail. That doesn't mean that you're you fixed everything. It means that your tests are worthless. Eventually. so they actually, a couple of summers ago, they had us go through all of the test cases, looking at, uh, the various results to evaluate like, okay, if this is a test case that we've run 30 times and it hasn't failed for the last 28 times, is there really any value in running it at all anymore?Uh, so long as that particular functionality isn't being updated because they update their printer driver every few months when they come out with a new line of printers, but they're not really changing the core functionality of what any given printer can do. They're just adding, like model numbers and things like that.So when it comes to like the ability of the printer to generate such and such a barcode on a particular kind of media, like that only gets you so far. but when you have, you know, uh, some printers have RFID capability and some don't, and so then you can, you get to kind of mix it up a little bit, depending on what features are present on the model.So deprecation of, worn out test cases, uh, does help to mitigate, you know, the ballooning, test suite. I'm sure. I'm sure Bose has their own approachMaxwell: [00:49:02] Absolutely. there are certain features then might also fall off entirely, where, you'll look at how many users are actually using a certain feature. like Michael was saying, you know, there might not be any failures on this particular feature, plus it's not particularly being used a lot.So, so it's a good candidate for being automated. Right. so also we'll look at cases such as such as that. and we'll go through a test cycle scrubs. we've had to do, um, a series of, update matrixes that we've had to, um, progressively. look at how much of the market has already updated to a certain version.so if a certain part of the market, if 90% of the market has already updated to this version, You don't you no longer have to test from here to here as far as your update testing. So that's another way in which you can, which you slowly start to reduce test, test cases and coverage.but you're always, you're always looking at that with risk assessment in mind. Right. And, and you're, and you're looking at, you know, who are the end users that, you know, what, what's the, what's the customer impact. If we're, if we're pulling away. Um, or if we're automating this set of test cases.so, you know, we go about that very, uh, carefully, but, we've been gradually more and more involved in helping them assess, what test cases are the best ones to be manually run? cause those are the ones that we end up finding defects in time and time again.so those, so those are the areas. that we've really helped rather than having, you know, cause lots of times clients will, if they do have a QA department you know, the test cases will be written more in an automation type language. So it's like, okay, why don't we just automate these test cases to begin with?And it'll be very broad scope where they have everything is written as a test case, for the overall functionality. And it's just way too much as you're pointing out Jeremy and as features grow. It just, that just continues on. it has to be whittled down in the early stages to begin with.but that's how we, that's how we help out. to finally, you know, help manage these cycles to get them in a more reasonable, manual testing, cadence, right. And then having, having the automated section of test cases have that be, you know, the larger portion of the overall coverage as it should be in general.Jeremy: [00:51:29] so it sounds like there's this, this process of you working with, uh, the client figuring out, what are the test cases that. don't or haven't brought up an issue in a long time, or, the things that get the most or the least use from customers, things like that, you, you, you look at all this information to figure out what are the things that for our manual tests we can focus on, um, and try to push everything else. Like you said, into some automated tests.Michael: [00:51:59] So if, over time, we're starting to see these trends with older test cases or simpler test cases. You know, if we notice that there's a potential we'll bring that to the, to the client's attention.And we'll say, we were looking at this batch of tests for basic features and we happened to notice that they haven't, failed ever or in two years or whatever. Would you consider us dropping those, at least for the time being, see how things go. and you know, that way we're spending less of their time.So to speak, you know, on the whole testing process, because as you pointed out, like the more you build a thing, the more time you have to take, you know, to test it from one end to the other. but at the same time, uh, a number of our analysts are, um, OAST 508 trusted tester certified for accessibility testing, using screen readers and things like that.uh, it's interesting how many web applications, you know, it just becomes baked into the bones. Right. And so, you know, you'll be having a team meeting talking about. yesterday's work. Um, and somebody will mention, you know, when I, when I went to such and such page, you know, because this person happened to use, a stylus to change the custom colors of the webpage or something like that.Um, they'll say, you know, it really, it was not very accessible and there was light green, there was dark green, there was light blue, like I can, you know, and so I used my style sheet to make them. Red and yellow and whatever. and you see enough of that kind of stuff. And then that's an opportunity, to grow our engagement with the client, right?Because we can say by the, by, you know, we noticed these things, we do offer this as a service. If you wanted to fold that in or, you know, set it up as like a one-time thing, even, you know, it all depends on, how much value it can bring. The client, right. you know, we're not pushing sales, trying to, Oh, we'll always get more whatever.Um, but it's just about, when you see an opportunity, for, improvement of the client's product or, you know, uh, helping, uh, better secure their position in the market or, you know, however, however it works or could work to their advantage. You know, we sort of feel like it's our duty. To mention it as their partner.we also do data analysis, you know, we don't just do QA testing. I know that's the topic here, of course. but that is another, another way where, you know, our discerning analysts can find, one of our products or one of our clients rather. we do monthly, uh, call center. Like help desk calls.we analyze that data in aggregate and, you know, they'll find these little spikes, you know, on a certain day, say over there or a clutch over a week of people calling about a particular thing. And then we can say to the, to the client, you know, did you push a new feature that day or was it rainy that day?Or, you know, I mean, it could be any, and maybe the client doesn't care, but. But we see it. So we say it and, and let them decide what to do with the information.Jeremy: [00:55:08] The comment about accessibility is, is, um, is really good because it sounds like if you're a company and you're building up your product and you may not be aware of the accessibility issues, um, you have a tested by someone who's using a screen reader, you know, sees those issues with contrast and, and so on.And now the developer, they have like these, these specific, actionable things to do and potentially even, um, moved those into automated tests to go like, okay, we need to make sure that these UI elements have this level of contrast, things like that.Michael: [00:55:45] Yeah. And there's different screen readers too. you know, the, the certification process, like with the government to become a trusted tester uses one particular screen reader named Andy it's an initialism. Um, but there are others and, you know, then it's on us to become familiar with, you know, what else is out there because it's not like everyone is going to be using the same screen reader, just like not everyone uses the same browserMaxwell: [00:56:10] I think the clients realize that, yeah, we do have a good automation department, but is it well balanced with what they're doing manual QA wise? And I think that's where we often find that there's a little bit lacking that we can provide extra value for, or we can boost what is currently there.Michael: [00:56:28] Our employees are quality assurance analysts. They're not testers. They don't just come in, read the script, then, Pokemon go afterwards. we count on them to bring that critical eye, you know, and they're, and everyone's own unique perspective. Uh, when they go to use any given product, you know, Pay attention to what's happening.You know, even if it's not in the test case, you know, something might, you know, flash on the screen or there might be this pause before the next, uh, thing kicks off that you are waiting for. And that happens enough times and you kind of notice, like there's always this lag right before the next step, you know, and then you can check that out with.The developer, like, is this lag, do you guys care about this lag at all? And you know, sometimes we find out that it's unavoidable because something, you know, something under the hood has to happen before, the next thing can happen.Maxwell: [00:57:20] and even asking those questions, we've found out fascinating things like, you know, why is there this lag every time when we run this test, you know, we never want to want to derail a client too much. You know, we're always very patient for the answer. And sometimes we don't, you know, we might not get the answer, but I think that that does help build that level of respect between us and the developers, uh, that we really care what their, what their code is doing. And we want to understand, you know, if there is a slight hiccup what's causing that slight hiccup, it's, it's, it ends up being fascinating for our analysts as we are learning the product.And that's what makes us wanna want to really learn, um, exactly what that, what the code is doing.Michael: [00:58:02] though I'm not a developer. Um, when I first started at Aspiritech, I worked on Bose as well, and I really enjoyed just watching their code, scroll down the screen. As you know, the machine was booting up or the speaker was updating because you can learn all sorts of interesting things about what's happening, you know, that you don't see normally. There's all sorts of weird inside jokes, uh, in terms of like what they call the command or, you know, Oh, there's that same spelling error where it's only one T or, you know, things that you kind of, you kind of get to know the developers in a way, you know, like, Oh, so-and-so wrote that line.We always wondered. Cause there's only this one T and that word was supposed to have two teeth, you know, and they say, Oh yeah, we keep giving him a hard time about that. But now we can't change it because, so we have fun.Jeremy: [00:58:52] If people want to learn more about, what you guys are working on or about Aspiritech where should they head?Maxwell: [00:58:58] www.aspiritech.org. is our, is our website, head to there, uh, give you all the information you need about us. Michael: [00:59:06] We also have a LinkedIn presence, uh, that we've been trying to leverage lately and, uh, talk to our current clients. I mean, they've really, they've really been our biggest cheerleaders and the vast majority of our, of our work has come from client referrals. was, is an example of that too.You know, they were referred by a client who was referred, you know, we're very proud of that. You know, it speaks volumes about, about the quality of our work and the relationships that we build and, and, uh, you know, we have very little customer turnover in addition to very little staff turnover and that's because we invest in these relationships and then it seems to work for both sides.Jeremy: [00:59:46] Michael, maxwell, thanks for coming on the showMaxwell: [00:59:49] thank you so much, Jeremy. It's great talking to you.Michael: [00:59:52] Thanks for having us
Examples from: Justin Kan and Matt ArbesfeldAudio clips:- Justin Kan's origin story (26ish mins in)- Matt Arbesfeld of LogRocket in Software Engineering Radio (19 mins in)- Song: Shakira - Try Everything
2020 - нам будет тебя не хватать. Или нет? В заключительном выпуске года мы вспоминаем всё, что принёс нам год удалёнки и он-лайн ивентов: от null safety до релиза нового приложения Meduza. Приготовили мы и сюрприз. Гости этого выпуска подробно расскажут об интеграции Flutter с платформой VK Mini Apps. Что нужно сделать для запуска приложения на платформе, какие ограничения стоит учитывать и с какими непреодолимыми проблемами предстоит столкнуться - обо всём этом мы подробно расспросили Сашу и Юру. У микрофона: Евгений Сатуров (Surf), Александр Денисов (Epam), Александр Юрковский (Meditivity), Юрий Дорофеев (Одноклассники). Главное по выпуску: ❗️ VK Mini Apps: https://vk.com/dev/vk_apps_docs ❗️ Go Here в VK Mini Apps: https://vk.com/services?act=search&q=Go%2BHere&w=app7651642_8053093 ❗️ VK Bridge на pub.dev: https://pub.dev/packages/vk_bridge ❗️ Новый проект Юры и Саши: https://worddaily.ru/ Новости: ⚡️ Анонс официального мероприятия 03.03.2021: https://twitter.com/FlutterDev/status/1339256577480683521 ⚡️ Плейлист с докладами DartUp 2020: https://www.youtube.com/playlist?list=PLxcvsYzLfaTCH6RNIr7PyLrEZRlP6uKhn ⚡️ Обновлённый фреймворк для интеграционных тестов: https://medium.com/flutter/updates-on-flutter-testing-f54aa9f74c7e https://flutter.dev/docs/testing/integration-tests ⚡️ Статья о том, как принималось решение о способе реализации Null safety в Dart: https://medium.com/dartlang/why-nullable-types-7dd93c28c87a ⚡️ Тред для сбора фидбека по Dart null safety на Reddit: https://www.reddit.com/r/FlutterDev/comments/k6q7oz/looking_for_feedback_on_migrating_code_to_use/ ⚡️ Страничка со статусом переезда пакетов из pub.dev на Dart null safety: https://github.com/dart-lang/sdk/wiki/Null-safety-migration-status ⚡️ Большое интервью Тима Снита на Software Engineering Radio: https://www.se-radio.net/2020/12/episode-437-tim-sneath-on-architecture-of-flutter/?source=cssocial ⚡️ Fuchsia вышла в опен-сорс https://opensource.googleblog.com/2020/12/expanding-fuchsias-open-source-model.html Официальный канал подкаста: t.me/flutterdevpodcast_news Официальный чат подкаста: t.me/flutterdevpodcast
Shownotes Deze keer hebben we Felienne Hermans te gast. Felienne is in haar dagelijkse werk hoofddocent (Hoofd PERL research groep) aan de Universiteit in Leiden. Ze is lid van de TC39 ECMAScript/Javascript task group. Verder is ze een van de hosts van de Software Engineering Radio podcast. En Felienne is een van de oprichters van de Joy of Coding conferentie. Maar dan vindt ze nog tijd om op zaterdagen les te geven aan kinderen. We gaan het met Felienne hebben over het leren programmeren, wat mensen echte programmeertalen vinden, en allerlei zaken die Felienne bezig houden. Dus weer een aflevering die je niet wil missen! Onderwerpen: 00:00:00 - Intro 00:09:07 - Conferentie klets 00:15:26 - Leren programmeren 00:29:56 - Hoe leer je iemand programmeren? 01:00:04 - Waarom zou iedereen moeten kunnen programmeren? 01:11:42 - Wat zien mensen als echte programmeertalen? 01:43:40 - Tips 01:49:27 - Code Reading Club 01:56:08 - Programmeerbende boekbespreking met Dawoud Karmous Felienne Podcast - Software Engineering Radio Twitter - https://twitter.com/Felienne LinkedIn - https://www.linkedin.com/in/felienne Hedy programming - http://www.hedycode.com Code reading club - Blog post Gast bij Met Nerds om Tafel - S06E16 - Hoe je kinderen leert programmeren terwijl je voor een marathon traint Tips Twitch kanaal van Tae’lur Alexis aka Cyberbarbie - https://twitch.tv/cyberbarbie Content Security Policies door Matt Brunt - https://www.youtube.com/watch?v=oW1VUDMuHF0 Boeken Programmeerbende - http://programmeerbende.nl/ Sapiens - https://www.bol.com/nl/p/sapiens/9200000116370070/ Kom met ons kletsen in onze Slack! Find out more at http://codeklets.nl
Engineers love to experiment, but is experimenting with your career a good idea? In this episode of Programming Leadership, Marcus talks with salary negotiation expert Josh Doody about how this counterintuitive decision can benefit you over the long term. According to Doody, the key is to stop thinking in binary terms of “good” and “bad” outcomes and optimize for learning, instead. Doing so gives us a broader spectrum of results that we can use to decide how to proceed. This will allow us to better calculate risk while also avoiding Old Timer’s Disease and falling victim to loss aversion. Show Notes “Decision science” (3:55) Thinking of outcomes as a spectrum rather than “good” or “bad” (5:13) Optimize for learning, not outcomes (7:31) How career experimentation has helped Josh (11:17) Understanding the long game (16:23) How to avoid Short-Timer’s Disease (18:53) Using expected value to calculate risk (23:41) Loss aversion can impede our ability to accurately calculate risk (27:59) “Good” and “bad” are subjective terms when it comes to experimentation (32:39) Links Josh Doody on Twitter Antifragile, Nassim Taleb Should Belichick have gone for it on 4th and 2? by Josh Doody Marcus’s interview with Josh on Software Engineering Radio
Ben and Grant join BriteCore’s VP of Product Architecture Myles Loffler to discuss application messaging, commands, events, and more. Software Engineering Radio ep. 369: Derek Collison on Messaging Systems and NATS AWS EventBridge Kafka NATS AWS SNS Vuex/Redux Well-Architected Framework AWS re:Invent 2019: Application integration patterns for microservices The Tao of Microservices Martin Fowler - What do you mean by "Event-Driven?" Martin Fowler - Event Collaboration Martin Fowler - Smart endpoints, dumb pipes Follow us on Twitter at @DevBritePod. Follow Ben on Twitter at @deybhayden. Follow Grant on Twitter at @gmcconnaughey. Encourage Myles to send his first tweet in 11 years at @MylesLoffler. Music by Komiku.
Felienne Hermans is universitair hoofddocent aan de Universiteit Leiden bij het Leiden Institute of Advanced Computer Science, vaste warde in Software Engineering Radio, een engelstalige podcast softwareontwikkeling, en als lid van de TC39-commissie denkt ze mee over hoe het Javascript van de toekomst er-uit zou moeten zien. Als dat nog niet genoeg reden was om haar uit te nodigen dan is ze ook de maker van het massive open online course (ook wel MOOC) over Scratch programmeren voor kinderen en schreef zij mee aan het kinderboek ‘De Programmeerbende‘.Tijdschema0:00:00 Per ongeluk zwartrijden0:03:07 Aankondiging Felienne Hermans0:05:48 Programmeren in Excel0:07:55 Draaitabellen zijn leven0:10:18 Programmeren voor kinderen0:17:20 Tieners met websites0:31:21 Goede leraren en de tablettoekomst0:42:23 De Programmeerbende0:48:25 Marathons lopen en Sportrusten1:00:32 Vragen van de luisteraars1:06:15 Is HTML een programmeertaal?1:20:15 Tips1:31:14 AfkondigingTIPSFelienne HermansRepl.it Online C Editor and IDEDe ProgrammeerbendeSportrusten 14 KM trainen voor een marathonHet McDonald's-principe om te bepalen waar je gaat etenVerleng je uitademen om je hartslag omlaag te krijgenRandalCKY Alliance Discord serverMalwarebytes (of virusscanner zoals Bitdefender), ook voor MacOSJoostGa eens niet naar Domino’s of New York Pizza maar naar een onafhankelijke pizzaboerKnives Out
Indigo Director of Engineering, Transport - Adam ConradShow notes:Here's Adam's most recent episode on the Software Engineering Radio podcast:https://www.se-radio.net/2020/01/episode-394-chris-mccord-on-phoenix-liveview/Support the show (https://www.indigoag.com/the-terraton-challenge)
CTO and partner at Manifest Solutions, Author, Java User Group leader, Java Champion, Trusted Technical Advisor, and Talent Developer. Serverless Applications in AWS - Cost of functions - Example of applications - Language choice - Tools to simplify Links https://www.juddsolutions.com/ https://www.linkedin.com/in/christophermjudd/ https://twitter.com/javajudd https://javajudd.net/ Sponsor Today's sponsor is Datadog, a cloud monitoring platform for dynamic infrastructure, distributed tracing, and logging. You can use Datadog for Serverless to search, filter, and explore all your AWS Lambda functions in one place, with the ability to easily drill-down into the performance of any function. Troubleshoot performance issues with distributed traces and collect custom metrics in real time to monitor key business metrics without adding latency to your applications. See the bigger picture in the Service Map, which visualizes Lambda functions along with all their dependencies. To see how Datadog can help you get a handle on your applications, infrastructure, and serverless functions, sign up for a free trial today--and get a free t-shirt. Visit https://dtdg.co/6figureserverless to get started. Chris' Podcast Recommendations John Maxwell Podcast - https://johnmaxwellleadershippodcast.com The 6 Figure Developer - https://6figuredev.com/ Security Now - https://twit.tv/shows/security-now AWS Podcast - https://aws.amazon.com/podcasts/aws-podcast/ A Geek Leader - https://www.ageekleader.com/ Java Off Heap - https://www.javaoffheap.com/ .NET Rocks - https://www.dotnetrocks.com/ (Nobody is going to read this are they? This could ruin my reputation as @javajudd.) Software Engineering Radio - https://www.se-radio.net/ No Fluff Just Stuff Podcast - https://nofluffjuststuff.com/podcast Consulting Success - https://www.consultingsuccess.com/podcast TWiT - https://twit.tv/shows/this-week-in-tech Sell or Die - https://www.sellordiepodcast.com/ The Changelog - https://changelog.com/podcast "Tempting Time" by Animals As Leaders used with permissions - All Rights Reserved × Subscribe now! Never miss a post, subscribe to The 6 Figure Developer Podcast! Are you interested in being a guest on The 6 Figure Developer Podcast? Click here to check availability!
Will Larson on Greater Than Code, Marcus Blankenship on Software Engineering Radio, Sonal Chokshi on Software Engineering Daily, Roman Pichler on Being Human, and Dave Thomas and Andy Hunt on Hanselminutes. I’d love for you to email me with any comments about the show or any suggestions for podcasts I might want to feature. Email podcast@thekguy.com. And, if you haven’t done it already, don’t forget to hit the subscribe button. This episode covers the five podcast episodes I found most interesting and wanted to share links to during the two week period starting September 2, 2019. These podcast episodes may have been released much earlier, but this was the fortnight when I started sharing links to them to my social network followers. WILL LARSON ON GREATER THAN CODE The Greater Than Code podcast featured Will Larson with hosts Jessica Kerr, Arty Starr, and Rein Henrichs. Will talked about systems thinking, specifically referencing Donella Meadows’ Thinking in Systems: A Primer. As a sixteen-year-old, he was exposed to systems thinking by his economics professor father. They talked about how to bring about change in complex systems and Rein brought up Virginia Satir’s change model. They talked about various forms of dysfunction, with an example being tasks that are marked as completed by developers without first doing the work of validation. Will’s own example is that executives never miss their goals; they just redefine the goals so that they hit them. There is a certain level of seniority where you can never be held accountable because you are the accountability function. Getting back into the topic of how to change complex systems, Will referenced the book, The First 90 Days as a great explanation of the need to go slow and observe before you try to change things. He says that the “great man theory” has been out of style for decades in the study of history, but is still in style in tech as the most causal way to understand how change works and also the most comforting. Rein talked about how the heroic individual myth is the other side of the coin to the scapegoat. Just as you pile all the blame onto the scapegoat, you pile all the credit onto the hero. He says that cultures that engage in hero myth-building are also likely to engage in scapegoating. Will says he himself has not seen much scapegoating at the companies he works at, likely because those cultures were unwilling to hold folks accountable for their work, but he has seen the hero myth at every company he has worked. Will then spoke about the 10x engineer myth. Will says he meets people who have been in tech for six or seven years who have the idea that they are almost done with their career. It may be due to the “senior engineer after two years” phenomenon where the career path is not well-defined and a lot of companies don’t know how to take advantage of the skills of people with 15 to 20 years of experience. A second reason is that the industry is an overwhelming and draining environment and people choose to opt out of it. As a result, we have very few engineers who have been around long enough to witness the long-term consequences of their brilliant ideas. Apple Podcasts link: https://podcasts.apple.com/ca/podcast/142-modeling-constraints-in-human-systems-with-will-larson/id1163023878?i=1000446345964 Website link: https://www.greaterthancode.com/modeling-constraints-in-human-systems MARCUS BLANKENSHIP ON SOFTWARE ENGINEERING RADIO The Software Engineering Radio podcast featured Marcus Blankenship with host Travis Kimmel. They talked about motivation, specifically motivation of engineering teams. Marcus says that motivation is the desire to get things done and every engineer coming out of school is motivated from day one. If you get one of these people hired onto your team and, two years later, they are demotivated, suffering from PTSD, scared to offer ideas, and figuring they are just a cog in a machine, your problem is your company or your team, not the engineer you hired. Marcus says he is doing secret research on motivation as he is now interviewing candidates for a job and asking them why they are looking to leave their current job. Nobody says, “Pay.” Often the answer is a lack of alignment with their boss or their company, resulting in the engineer losing the desire to contribute because of a relationship problem. These engineers are not stick-in-the-muds that are angry they don’t get to use COBOL anymore. Something happened where instead of having their ideas valued and heard and being part of the discussion, they somehow got disconnected from their boss. In the seventies, Marcus says, researchers discovered a strong correlation between positive employer-employee relationships and the amount of job satisfaction, quality of work, turnover intentions, and amount of promotions. We are thirty-five years into a few thousand scientific studies that continue to prove that the relationship one has with one’s supervisor matters more than any other factor when it comes to job performance and job satisfaction. Marcus says that a supervisor’s one true job is to create a trusting relationship with the people that report to you. Travis shared his own experience in having one-on-ones with his supervisors that felt to him like they were trying to artificial manufacturing a relationship because there was no indication of what the goal of the meeting was. Marcus says that good one-on-ones are bi-directional. One-on-ones in which the boss just gets status updates from the subordinate and gives new marching orders are often dissatisfying for both parties. Another flawed kind of one-on-one is where it is all about the employee. Such one-on-ones are not effective and neither party likes these either. Marcus suggests that we apply to our one-on-ones the same Agile thinking that we apply to our work. Every month, at one of your one-on-ones, do a retro on the one-on-one. Talk about why you are doing them, what value you’re getting from them, and how to make them better. They talked about psychological safety. Marcus says a lot of managers don’t realize that they are not in a good position to measure psychological safety based on their own gut. He says tools like Claire Lew’s knowyourteam.com, officevibe.com, and other anonymous survey tools can help. When we become a manager or team lead that has you supervising or leading, we forget that we are in a position of power. Travis added that leaders need to be careful about what they say casually so that it doesn’t get taken as a mandate. Apple Podcasts link: https://podcasts.apple.com/ca/podcast/episode-374-marcus-blankenship-on-motivating-programmers/id120906714?i=1000445260176 Website link: https://www.se-radio.net/2019/07/episode-374-marcus-blankenship-on-motivating-programmers/ SONAL CHOKSHI ON SOFTWARE ENGINEERING DAILY The Software Engineering Daily podcast featured a16z podcast host Sonal Chokshi with host Jeff Meyerson. Jeff started out by asking why a VC firm decided to start a podcast. Sonal says that a16z has always had a culture of writing, blogging, and sharing ideas. This led them to develop an editorial operation from which the podcast naturally followed. Jeff asked what lessons from blogging apply to podcasting. Sonal sees podcasting as the next evolution of blogging because of its similar intimacy and a similar feeling of authenticity. The difference, she says, is that podcasting is a community and a movement. Sonal talked about her favorite a16z episodes, including an episode on emojis. She loved it because everybody understands how to use emojis but there is a lot of deep tech and governance involved in making emojis possible. That episode, she said, encapsulates the whole a16z podcast: the intersection of technology, people, politics, context, culture, and humanity. Jeff brought up a16z’s connection to Mike Ovitz’s Creative Artists Agency. Having read Ovitz’s book and noticed how it portrays Ovitz as a workaholic, Jeff asked Sonal how she finds balance while drinking from the addicting technological firehose. Sonal says there is a lack of nuance in the debates about screen time and work/life balance. Apple Podcasts link: https://podcasts.apple.com/ca/podcast/a16z-podcasting-with-sonal-chokshi/id1019576853?i=1000446547922 Website link: https://softwareengineeringdaily.com/2019/08/09/a16z-podcasting-with-sonal-chokshi/ ROMAN PICHLER ON BEING HUMAN The Being Human podcast featured Roman Pichler with host Richard Atherton. Richard asked Roman what a product manager is. Roman says a product manager is someone who takes an idea and helps bring it to life, launch it, make it successful, and keep it successful. Richard asked about the distinction between a product manager and Scrum’s notion of product owner. Roman sees the product owner as a product management role, but methodologies like SAFe have redefined the product owner to be a tactical role, misunderstanding the intention behind the role and the practicalities such as answering questions from the dev team, refining backlog items, and answering support and sales questions. He says there is too much focus on the details and this risks losing sight of the big picture. To do a good job for users and for the business, Roman says it is helpful to have people looking after digital assets with the right qualifications, skills, organizational support, authority, and autonomy. He says the term “mini-CEO” appeals to some product people because it indicates that product people need a certain level of authority, but a CEO would have marketing and sales functions under their control and product people do not. Richard asked what talents Roman had to develop to be a great product person. Roman started out as a programmer and began to help business groups come up with new products. What helped him most was to boost his own understanding of how business works and the second most important element was letting go of being interested in how digital products work and focusing instead on who benefits from them. Apple Podcasts link: https://podcasts.apple.com/ca/podcast/72-to-manage-products-is-to-care-with-roman-pichler/id1369745673?i=1000446514943 Website link: http://media.cdn.shoutengine.com/podcasts/4081235a-554f-4a8f-90c2-77dc3b58051f/audio/9b2501e7-e618-46f6-8f41-abd69c871211.mp3 DAVE THOMAS AND ANDY HUNT ON HANSELMINUTES The Hanselminutes podcast featured Dave Thomas and Andy Hunt with host Scott Hanselman. Scott started by asking whether Dave and Andy knew at the time they wrote the Pragmatic Programmer 20 years ago that they were writing what would become a seminal work. Dave said that both of them were stunned by its success. The book was intended as a way to clarify their own thoughts based on their experiences as consultants in which their clients all had the same kinds of problems: inconsistent builds, the shipping of untested code, and impossible-to-change designs. Scott asked about the importance of the name of the book. Andy said that there was a strain of thought at the time the book was written that was dogmatic and they deliberately pushed against such approaches. Dave pointed out that this was harder on their readers because it forced them to figure out for themselves what works for them. They got into a discussion of what kind of educational background one needs to be a successful programmer. Dave revealed that he is currently teaching classes at SMU to, he says, corrupt the youth by teaching them things like functional programming, and because traditional computer science education is poorly serving the industry and the student. People are coming out of university with tens or hundreds of thousands of dollars of debt and, in terms of their value in the industry, they are not much different from people who are coming out of eight-week bootcamps. He teaches third or fourth year undergraduates and graduate students and he has found that none have been shown any form of testing. He would much rather hire someone who had the right attitude, was smart, and who could talk to people and he could show such a person how to code while on the job. Andy added that he gets the feeling that most computer science programs are there to teach you to become a professor of computer science rather than a problem-solver. What Andy says people need to learn, and what university education is not providing, are problem-solving skills. Apple Podcasts link: https://podcasts.apple.com/ca/podcast/pragmatic-programmer-celebrates-20-years-dave-thomas/id117488860?i=1000446461596 Website link: https://hanselminutes.simplecast.com/episodes/the-pragmatic-programmer-celebrates-20-years-with-dave-thomas-and-andy-hunt-VBmLw9lP LINKS Ask questions, make comments, and let your voice be heard by emailing podcast@thekguy.com. Twitter: https://twitter.com/thekguy LinkedIn: https://www.linkedin.com/in/keithmmcdonald/ Facebook: https://www.facebook.com/thekguypage Instagram: https://www.instagram.com/the_k_guy/ YouTube: https://www.youtube.com/c/TheKGuy Website:
Venturi's Voice: Technology | Leadership | Staffing | Career | Innovation
In this episode Andy Davis talks Robert Bluman, an editor and host of the podcast Software Engineering Radio and also a lead DevOps Engineer. On the show he chats to Andy about moving from a software engineering role to a DevOps engineer. He also discusses his role at Software Engineering Radio. Robert Blumen is a DevOps engineer at Salesforce with more than 25 years’ experience in software development, programming, architecture, and media production. His interests include software architecture, scalability, high availability, persistence, big data, and devops. Robert is a graduate of Stanford University and the University of California Berkeley. Show Notes: 1.09 The history of Software Engineering Radio. 2.27 The sharing community that has been built around the technical community. 5.25 The importance of having a strong team dynamic as well as a team with great technical skills. 7.11 What were the reasons behind you moving to a DevOps position. 8.21 DevOps as a cultural shift not a technical discipline. 9.24 Will DevOps evolve again? 10.58 Do you still work on projects around work? 12.02 Progressing into the managerial role? 12.37 Company’s creating paths for employees who don’t want to progress into management. 14.22 How have your responsibilities with software engineering radio changed since you started. 17.04 What’s the typical feedback you like to receive for software engineering radio. 19.23 Audience interaction with your podcast. 21.34 Generating revenue from podcasts. 23.14 Podcasts and there competitive marketplace. 24.55 Committing to producing content. 26.58 How long did it take Software engineering radio to grow?
Venturi's Voice: Technology | Leadership | Staffing | Career | Innovation
In this episode Andy Davis talks to Jeffrey Meyerson, host of Software Engineering Daily. On the show Jeff chats to Andy about the reasons behind starting Software Engineering Daily and the direction the show is heading. Jeff also shares some of his career advice for those starting a career in software engineering. Jeffrey Meyerson: After working for Software Engineering Radio for a few years and working at Amazon Jeff decided to form Software Engineering Radio as a full time career. The scope of software engineering is growing. If you use WordPress, Ableton or PhotoShop, you are a software engineer. If you use Facebook, you are a software engineer–you are helping build software. On a twenty year time horizon, there is no “opting out” of knowing how software works. But this is OK because learning about software is possible (and fun) no matter Show Notes: 1.07 How did you begin Software Engineering Daily. 2.14 How do you get the resources together to make a great show? 4.02 How did your career progress and how did you approach your career progression. 7.41 Using Software Engineering Daily to educate budding software engineers on the opportunities available to them. 11.18 Looking for a career that you actually enjoy. 13.00 What do the next few years look like for Software Engineering Daily. 16.45 What have some of the hurdles been when founding Software Daily. 19.37 What are some of the key lessons you feel you’ve learned on Software Engineering Daily. 22.56 Looking at podcast techniques using the Joe Rogan show as an examples. 24.39 Using podcasts to collaborate and connect with people and their links with the the open source tech community.
Software Engineering Daily was started a year and a half ago, based on what I learned from my podcasting experience on Software Engineering Radio. Last week, I interviewed Robert Blumen, the editor of Software Engineering Radio, about how that podcast is produced. In today’s episode, Robert interviews me about this podcast. If you are thinking The post How Software Engineering Daily Works appeared first on Software Engineering Daily.
Four years ago, I started volunteering for a popular podcast about software–Software Engineering Radio. For the next two years, I learned about the process of making a quality podcast about engineering. With its emphasis on preparation, timeless engineering principles, and attention to the listener, Software Engineering Radio continues to be one of the most popular The post Software Podcasting with Robert Blumen appeared first on Software Engineering Daily.
Software developers need to stay up with the latest technologies available out there. There are new technologies coming up every day, new programming languages being presented to the world, old stuffing getting down, new ones coming up, etc. Staying up with all that stuff is difficult. Where should you be searching for all these information? Books get outdated very quickly, websites are hard to keep if you don't have time to sit down and read, etc. Podcasts are an awesome way to keep you up to date with all the information that is out there. You can download and listen to while you're driving, running, at the gym, etc. This is why I decided to create this video with my Top 10 Developer Podcasts you should be definitely be listening to. Wanna know what my top 10 is? Watch this video and find out! The Ultimate List Of Developer Podcasts: https://simpleprogrammer.com/2016/10/29/ultimate-list-developer-podcasts/ Soft Skills Book: http://simpleprogrammer.com/softskills LIST OF DEVELOPER PODCASTS .NET Rocks Podcast: https://simpleprogrammer.com/dotnetrocks Herding Code Podcast: https://simpleprogrammer.com/herdingcode The Hansel Minutes Podcast: https://simpleprogrammer.com/hanselminutes Developer On Fire Podcast: https://simpleprogrammer.com/developeronfire Talk Python To Me Podcast: https://simpleprogrammer.com/talkpythonme Code Newbie Podcast: https://simpleprogrammer.com/codenewbie Software Engineering Radio: https://simpleprogrammer.com/seradio JavaScript Jabbers Podcast: https://simpleprogrammer.com/javajabbers Ruby Rogues Podcast: https://simpleprogrammer.com/rubyrogues The Changelog Podcast: https://simpleprogrammer.com/changelog BONUS EntreProgrammers Podcast: https://simpleprogrammer.com/entreprogrammerspodcast Simple Programmer Podcast: https://simpleprogrammer.com/sppodcast
Kent Beck's twitter profile says "Programmer, author, father, husband, goat farmer". But I know him best from his work on extreme programming, test first programming, and test driven development. He's the one. The reason you know about TDD is because of Kent Beck. I first ran across writings from Kent Beck as started exploring Extreme Programming in the early 2000's. Although I don't agree with all of the views he's expressed in his long and verbose career, I respect him as one of the best sources of information about software development, engineering practices, and software testing. Along with Test First Programming and Test Driven Development, Kent started an automated test framework that turned into jUnit. jUnit and it's model of setup and teardown wrapping test functions, as well base test class driven test frameworks became what we know of as xUnit style frameworks now, which includes Python's unittest. He discussed this history and a lot more on episode 122 of Software Engineering Radio. The episode is titled "The History of JUnit and the Future of Testing with Kent Beck", and is from Sept 26, 2010. http://www.se-radio.net/2010/09/episode-167-the-history-of-junit-and-the-future-of-testing-with-kent-beck/ I urge you to download it and listen to the whole thing. It's a great interview, still relevant, and applicable to testing in any language, including Python. What I've done in this podcast is take a handful of clips from the interview (with permission from IEEE and SERadio), and discuss the clips and my opinions a bit. The lessons are: You're tests should tell a story. Be careful of DRY, inheritance, and other software development practices that might get in the way of keeping your tests easy to understand. All test should help differentiate good programs from bad programs and not be redundant. Test at multiple levels and multiple scales where it makes sense. Differentiating between TDD, BDD, ATDD, etc. isn't as important as testing your software to learn about it. Who cares what you call it.
Software Engineering Radio - The Podcast for Professional Software Developers
In der sechsundzwanzigsten Episode des Anwendungsentwickler-Podcasts gebe ich Tipps für andere interessante Podcasts für Softwareentwickler. Inhalt Softwareentwicklung allgemein Software Engineering Radio heise Developer CodeNewbie innoQ Podcast This Developer’s Life Hanselminutes Softwareentwicklung speziell .NET .NET Rocks! Herding Code This Week On Channel 9 Java The Java Posse (eingestellt) Java Public House Illegal Argument Ruby Rogues Functional... Der Beitrag Podcasts für Softwareentwickler – Anwendungsentwickler-Podcast #26 erschien zuerst auf IT-Berufe-Podcast.
02:52 - David Herman Introduction Twitter Blog JavaScript Jabber Episode #54: JavaScript Parsing, ASTs, and Language Grammar w/ David Herman and Ariya Hidayat JavaScript Jabber Episode #44: Book Club! Effective JavaScript with David Herman Effective JavaScript by David Herman @effectivejs TC39 Mozilla 03:50 - The Rust Programming Language [GitHub] rust 06:31 - “Systems Programming Without Fear” 07:38 - High vs Low-level Programming Languages Garbage Collection and Deallocation Memory Safety Performance and Control Over Performance 11:44 - Stack vs Heap Memory Etymology of "Foo" RAII (Resource Acquisition Is Initialization) 16:52 - The Core of Rust Ownership Type System 24:23 - Segmentation Fault (Seg Faults) 27:51 - How much should programmers care about programming languages? Andrew Oppenlander: Rust FFI (Embedding Rust in projects for safe, concurrent, and fast code anywhere.) 32:43 - Concurrency and Multithreaded Programming 35:06 - Rust vs Go 37:58 - servo 40:27 - asm.js emscripten 42:19 - Cool Apps Built with Rust Skylight Wit.ai 45:04 - What hardware architectures does the Rust target? 45:46 - Learning Rust Rust for Rubyists by Steve Klabnik Picks Software Engineering Radio (Dave) How Will You Measure Your Life? by Clayton M. Christensen (Dave) The Presidents of the United States of America (Dave) Design Patterns in C (AJ) Microsoft Edge Dev Blog: Bringing Asm.js to Chakra and Microsoft Edge (AJ) The Web Platform Podcast: Episode 43: Modern JavaScript with ES6 & ES7 (AJ) Firefox Fame Phone (AJ) iTunes U CS106A (Programming Methodology) (Aimee) Valerian Root on Etsy (Aimee) The Dear Hunter - Live (Jamison) Designing Data-Intensive Applications by Martin Kleppmann (Jamison) Fogus: Perlis Languages (Jamison) Galactic Civilizations III (Joe) Visual Studio Code (Joe) Tessel 2 (Dave) Event Driven: How to Run Memorable Tech Conferences by Leah Silber (Dave) Plush Hello Kitty Doll (Dave)
02:52 - David Herman Introduction Twitter Blog JavaScript Jabber Episode #54: JavaScript Parsing, ASTs, and Language Grammar w/ David Herman and Ariya Hidayat JavaScript Jabber Episode #44: Book Club! Effective JavaScript with David Herman Effective JavaScript by David Herman @effectivejs TC39 Mozilla 03:50 - The Rust Programming Language [GitHub] rust 06:31 - “Systems Programming Without Fear” 07:38 - High vs Low-level Programming Languages Garbage Collection and Deallocation Memory Safety Performance and Control Over Performance 11:44 - Stack vs Heap Memory Etymology of "Foo" RAII (Resource Acquisition Is Initialization) 16:52 - The Core of Rust Ownership Type System 24:23 - Segmentation Fault (Seg Faults) 27:51 - How much should programmers care about programming languages? Andrew Oppenlander: Rust FFI (Embedding Rust in projects for safe, concurrent, and fast code anywhere.) 32:43 - Concurrency and Multithreaded Programming 35:06 - Rust vs Go 37:58 - servo 40:27 - asm.js emscripten 42:19 - Cool Apps Built with Rust Skylight Wit.ai 45:04 - What hardware architectures does the Rust target? 45:46 - Learning Rust Rust for Rubyists by Steve Klabnik Picks Software Engineering Radio (Dave) How Will You Measure Your Life? by Clayton M. Christensen (Dave) The Presidents of the United States of America (Dave) Design Patterns in C (AJ) Microsoft Edge Dev Blog: Bringing Asm.js to Chakra and Microsoft Edge (AJ) The Web Platform Podcast: Episode 43: Modern JavaScript with ES6 & ES7 (AJ) Firefox Fame Phone (AJ) iTunes U CS106A (Programming Methodology) (Aimee) Valerian Root on Etsy (Aimee) The Dear Hunter - Live (Jamison) Designing Data-Intensive Applications by Martin Kleppmann (Jamison) Fogus: Perlis Languages (Jamison) Galactic Civilizations III (Joe) Visual Studio Code (Joe) Tessel 2 (Dave) Event Driven: How to Run Memorable Tech Conferences by Leah Silber (Dave) Plush Hello Kitty Doll (Dave)
02:52 - David Herman Introduction Twitter Blog JavaScript Jabber Episode #54: JavaScript Parsing, ASTs, and Language Grammar w/ David Herman and Ariya Hidayat JavaScript Jabber Episode #44: Book Club! Effective JavaScript with David Herman Effective JavaScript by David Herman @effectivejs TC39 Mozilla 03:50 - The Rust Programming Language [GitHub] rust 06:31 - “Systems Programming Without Fear” 07:38 - High vs Low-level Programming Languages Garbage Collection and Deallocation Memory Safety Performance and Control Over Performance 11:44 - Stack vs Heap Memory Etymology of "Foo" RAII (Resource Acquisition Is Initialization) 16:52 - The Core of Rust Ownership Type System 24:23 - Segmentation Fault (Seg Faults) 27:51 - How much should programmers care about programming languages? Andrew Oppenlander: Rust FFI (Embedding Rust in projects for safe, concurrent, and fast code anywhere.) 32:43 - Concurrency and Multithreaded Programming 35:06 - Rust vs Go 37:58 - servo 40:27 - asm.js emscripten 42:19 - Cool Apps Built with Rust Skylight Wit.ai 45:04 - What hardware architectures does the Rust target? 45:46 - Learning Rust Rust for Rubyists by Steve Klabnik Picks Software Engineering Radio (Dave) How Will You Measure Your Life? by Clayton M. Christensen (Dave) The Presidents of the United States of America (Dave) Design Patterns in C (AJ) Microsoft Edge Dev Blog: Bringing Asm.js to Chakra and Microsoft Edge (AJ) The Web Platform Podcast: Episode 43: Modern JavaScript with ES6 & ES7 (AJ) Firefox Fame Phone (AJ) iTunes U CS106A (Programming Methodology) (Aimee) Valerian Root on Etsy (Aimee) The Dear Hunter - Live (Jamison) Designing Data-Intensive Applications by Martin Kleppmann (Jamison) Fogus: Perlis Languages (Jamison) Galactic Civilizations III (Joe) Visual Studio Code (Joe) Tessel 2 (Dave) Event Driven: How to Run Memorable Tech Conferences by Leah Silber (Dave) Plush Hello Kitty Doll (Dave)
Johannes of Software Engineering Radio (http://www.se-radio.net) talks with Rachel Laycock and Max Lincoln from ThoughtWorks about continuous delivery on Windows. The outline includes: introduction to continuous delivery; continuous integration; DevOps and ChatOps; decisions to be taken when implementing continuous delivery on windows; build tools on windows; packaging and deploy on windows; infrastructure automation and infrastructure as code with chef, puppet or ansible; snowflake server versus phoenix server; Windows PowerShell; the relation between open source and Microsoft tools; why not to use Linux; current development in the windows DevOps space; licensing and proprietary software in continuous delivery. This discussion was recorded live at the ThoughtWorks North American Away Day, The Westin Peachtree, Atlanta, Georgia
Software Engineering Radio - The Podcast for Professional Software Developers
With this episode, Software Engineering Radio begins a series of interviews on social/nontechnical aspects of working as a software engineer as Tobias Kaatz talks to Randy Shoup, former CTO at KIXEYE, about hiring in the software industry. Prior to KIXEYE, Randy worked as director of engineering at Google for the Google App Engine and as […]
Letzte Episode haben wir ein Buch zum Interviews führen besprochen. In dieser Episode wollten wir mit jemand reden, der Erfahrungen mit Interviews für Podcast hat. Markus hat schon mehrere als hundert Interviews geführt - für seine Podcasts Omega Tau und Software Engineering Radio. Wir haben ihn gebeten, ein wenig aus dem Nähkästchen zu plaudern.
Software Engineering Radio - The Podcast for Professional Software Developers
For Episode 200 of Software Engineering Radio, Diomidis Spinellis interviews Markus Völter, the podcast’s founder. Markus works as an independent researcher, consultant, and coach for itemis AG in Stuttgart, Germany. His focus is on software architecture, model-driven software development and domain specific languages as well as on product line engineering. Markus also regularly publishes articles, […]
Avdi Grimm 是 Ruby 社区知名的资深程序员,作者和社区领袖。在这期节目中我们请来 Avdi 来聊聊怎样突破“中级天花板”来达到更高的层次。 About Avdi Grimm: Avdi Grimm Avdi's Publications Ruby Rogues Growing Object Oriented Software Guided by Tests Practical Object Oriented Design in Ruby Smalltalk Best Practice Patterns Factoring How Developers Stop Learning: Rise of the Expert Beginner Go ahead and make a mess (Sandi Metz) Who I want to hire (Chad Fowler) I feel the opposite of burn out, interview with Chad Fowler Software Engineering Radio Pair Programming with me Corey Haines, Software Journeyman The hitchhiker's guide to the galaxy Wide Teams DivShot FantastiCal Domain Driven Design Destroy All Software RubyTapas Special Guest: Avdi Grimm.
Guest: Cheryl Morris, Alex Orso, Li-Te Chang, Martin Robillard Host: Martin Lippert Over the last few years, Eclipse has grown from a simple little IDE into a platform for developing software that can be used at many levels of complexity. At OOPSLA 2002, IBM announced the first round of Eclipse Innovation Grants, which funded 49 academics and researchers to do explore the uses of Eclipse in their work. These projects expanded the base of the software platform, but it also created something just as valuable: personal experiences growing and using Eclipse for teaching and research. Then, at OOPSLA 2003, the grant recipients presented their results in the first Eclipse Technology Exchange (ETX) workshop. This year, the organizers return to ooPSLA for the fifth workshop in the series. It will be held on Sunday, October 21, the first day of workshops and tutorials. This year's workshop will focus on the use Eclipse as a platform in teaching and research. It provides academics and researchers an opportunity to share their results and to help others to build on these experiences. In addition to experience reports, the fifth ETX offers an opening keynote address by Jeff McAffer, IBM Rational, titled "Equinox -- Trends in Eclipse as a Runtime". Jeff leads the Eclipse Equinox OSGi team in "forging new ground for Eclipse as a runtime", seeking to do for the server what Eclipse has already done to client-side tool development. In this podcast, Martin Lippert of Software Engineering Radio chats with the organizers of the 2007 ETX organizers at ooPSLA -- Cheryl Morris (IBM Toronto), Alex Orso (Georgia Tech), Li-Te Chang (IBM Cambridge), and Martin Robillard (McGill University) -- to talk first about the history of the ETX workshops and poster sessions and then about some of the exciting elements of this year's workshop program.
Guest: Cheryl Morris, Alex Orso, Li-Te Chang, Martin Robillard Host: Martin Lippert Over the last few years, Eclipse has grown from a simple little IDE into a platform for developing software that can be used at many levels of complexity. At OOPSLA 2002, IBM announced the first round of Eclipse Innovation Grants, which funded 49 academics and researchers to do explore the uses of Eclipse in their work. These projects expanded the base of the software platform, but it also created something just as valuable: personal experiences growing and using Eclipse for teaching and research. Then, at OOPSLA 2003, the grant recipients presented their results in the first Eclipse Technology Exchange (ETX) workshop. This year, the organizers return to ooPSLA for the fifth workshop in the series. It will be held on Sunday, October 21, the first day of workshops and tutorials. This year's workshop will focus on the use Eclipse as a platform in teaching and research. It provides academics and researchers an opportunity to share their results and to help others to build on these experiences. In addition to experience reports, the fifth ETX offers an opening keynote address by Jeff McAffer, IBM Rational, titled "Equinox -- Trends in Eclipse as a Runtime". Jeff leads the Eclipse Equinox OSGi team in "forging new ground for Eclipse as a runtime", seeking to do for the server what Eclipse has already done to client-side tool development. In this podcast, Martin Lippert of Software Engineering Radio chats with the organizers of the 2007 ETX organizers at ooPSLA -- Cheryl Morris (IBM Toronto), Alex Orso (Georgia Tech), Li-Te Chang (IBM Cambridge), and Martin Robillard (McGill University) -- to talk first about the history of the ETX workshops and poster sessions and then about some of the exciting elements of this year's workshop program.