Specification that defines a software interface between an operating system and platform firmware
POPULARITY
Need to change your boot order or check a startup setting? You'll need to get in to UEFI first. There are three ways to do it, and at least one of them works no matter what situation your computer is in. I'll show you how each one works.
As recently as December 2025, the Carnegie Mellon University Software Engineering Institute (SEI's) CERT Coordination Center (CERT/CC) documented a UEFI-related vulnerability in certain motherboard models, illustrating that early-boot firmware behavior continues to present security challenges despite requiring local physical access to exploit. While CERT/CC reported seven UEFI vulnerabilities in 2025, that number remains small compared to reported vulnerabilities in other software. However, the consequences of a potential UEFI attack are often more serious given the extremely high privileges UEFI firmware possesses. In our latest SEI Podcast, Vijay Sarvepalli, a senior information security architect specializing in vulnerability and threat analysis in CERT, sits down with Michael Winter, deputy technical director of threat analysis in CERT, to discuss research and mitigation of UEFI vulnerabilities and discuss a new tool, the CERT UEFI parser, an open source tool that uses program analysis to reveal the architecture of UEFI software, and explore this veiled source of vulnerabilities.
professorjrod@gmail.comIn this episode, we dive into a crucial decision for IT professionals and students preparing for their CompTIA exam: choosing between a clean install or an in-place upgrade of Windows. Understanding this choice is vital for effective tech exam prep and real-world IT skills development. We discuss the technician's approach to troubleshooting issues like slow laptops, pop-ups, crashes, and system instability, highlighting how an incorrect decision can lead to persistent problems such as corruption, malware, and driver conflicts. Tune in to boost your technology education and get practical insights for your CompTIA study guide journey.From there, we zoom in on the planning that makes a Windows 11 installation succeed: verifying CPU support, RAM, storage, TPM 2.0, and Secure Boot before you start; staging essential device drivers so you do not lose Wi‑Fi or audio afterward; and checking application compatibility and licensing so an upgrade does not break business-critical software. We also talk about backups the way CompTIA exams expect you to think about them, and how time constraints and data criticality shape your real-world approach.Then we get into execution: boot devices and boot order in BIOS vs UEFI, GPT vs MBR (and why a 4TB drive must use GPT), and NTFS vs FAT32 so you do not get trapped by file-size limits. We round it out with deployment methods like unattended installation, network deployment, and zero touch deployment, plus repair and recovery options that help you choose the least destructive fix first. If you're studying for CompTIA A+ or you just want to install Windows with confidence, this one gives you a clean, practical framework. Subscribe, share this with a friend who “just clicks Next,” and leave a review with your biggest Windows install lesson.Support the showArt By Sarah/DesmondMusic by Joakim KarudLittle chacha ProductionsJuan Rodriguez can be reached atTikTok @ProfessorJrodProfessorJRod@gmail.com@Prof_JRodInstagram ProfessorJRod
Ubuntu wants a leaner, stricter GRUB, and your favorite setup may not survive the cut. We break down what's really changing, and the practical ways to adapt. Plus, Chris moves on from one of his favorite open source apps.Sponsored By:Jupiter Party Annual Membership: Put your support on automatic with our annual plan, and get one month of membership for free!Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love.Support LINUX UnpluggedLinks:
Bryan Cantrill is the co-founder and CTO of Oxide Computer Company. We discuss why the biggest cloud providers don't use off the shelf hardware, how scaling data centers at samsung's scale exposed problems with hard drive firmware, how the values of NodeJS are in conflict with robust systems, choosing Rust, and the benefits of Oxide Computer's rack scale approach. This is an extended version of an interview posted on Software Engineering Radio. Related links Oxide Computer Oxide and Friends Illumos Platform as a Reflection of Values RFD 26 bhyve CockroachDB Heterogeneous Computing with Raja Koduri Transcript You can help correct transcripts on GitHub. Intro [00:00:00] Jeremy: Today I am talking to Bryan Cantrill. He's the co-founder and CTO of Oxide computer company, and he was previously the CTO of Joyent and he also co-authored the DTrace Tracing framework while he was at Sun Microsystems. [00:00:14] Jeremy: Bryan, welcome to Software Engineering radio. [00:00:17] Bryan: Uh, awesome. Thanks for having me. It's great to be here. [00:00:20] Jeremy: You're the CTO of a company that makes computers. But I think before we get into that, a lot of people who built software, now that the actual computer is abstracted away, they're using AWS or they're using some kind of cloud service. So I thought we could start by talking about, data centers. [00:00:41] Jeremy: 'cause you were. Previously working at Joyent, and I believe you got bought by Samsung and you've previously talked about how you had to figure out, how do I run things at Samsung's scale. So how, how, how was your experience with that? What, what were the challenges there? Samsung scale and migrating off the cloud [00:01:01] Bryan: Yeah, I mean, so at Joyent, and so Joyent was a cloud computing pioneer. Uh, we competed with the likes of AWS and then later GCP and Azure. Uh, and we, I mean, we were operating at a scale, right? We had a bunch of machines, a bunch of dcs, but ultimately we know we were a VC backed company and, you know, a small company by the standards of, certainly by Samsung standards. [00:01:25] Bryan: And so when, when Samsung bought the company, I mean, the reason by the way that Samsung bought Joyent is Samsung's. Cloud Bill was, uh, let's just say it was extremely large. They were spending an enormous amount of money every year on, on the public cloud. And they realized that in order to secure their fate economically, they had to be running on their own infrastructure. [00:01:51] Bryan: It did not make sense. And there's not, was not really a product that Samsung could go buy that would give them that on-prem cloud. Uh, I mean in that, in that regard, like the state of the market was really no different. And so they went looking for a company, uh, and bought, bought Joyent. And when we were on the inside of Samsung. [00:02:11] Bryan: That we learned about Samsung scale. And Samsung loves to talk about Samsung scale. And I gotta tell you, it is more than just chest thumping. Like Samsung Scale really is, I mean, just the, the sheer, the number of devices, the number of customers, just this absolute size. they really wanted to take us out to, to levels of scale, certainly that we had not seen. [00:02:31] Bryan: The reason for buying Joyent was to be able to stand up on their own infrastructure so that we were gonna go buy, we did go buy a bunch of hardware. Problems with server hardware at scale [00:02:40] Bryan: And I remember just thinking, God, I hope Dell is somehow magically better. I hope the problems that we have seen in the small, we just. You know, I just remember hoping and hope is hope. It was of course, a terrible strategy and it was a terrible strategy here too. Uh, and the we that the problems that we saw at the large were, and when you scale out the problems that you see kind of once or twice, you now see all the time and they become absolutely debilitating. [00:03:12] Bryan: And we saw a whole series of really debilitating problems. I mean, many ways, like comically debilitating, uh, in terms of, of showing just how bad the state-of-the-art. Yes. And we had, I mean, it should be said, we had great software and great software expertise, um, and we were controlling our own system software. [00:03:35] Bryan: But even controlling your own system software, your own host OS, your own control plane, which is what we had at Joyent, ultimately, you're pretty limited. You go, I mean, you got the problems that you can obviously solve, the ones that are in your own software, but the problems that are beneath you, the, the problems that are in the hardware platform, the problems that are in the componentry beneath you become the problems that are in the firmware. IO latency due to hard drive firmware [00:04:00] Bryan: Those problems become unresolvable and they are deeply, deeply frustrating. Um, and we just saw a bunch of 'em again, they were. Comical in retrospect, and I'll give you like a, a couple of concrete examples just to give, give you an idea of what kinda what you're looking at. one of the, our data centers had really pathological IO latency. [00:04:23] Bryan: we had a very, uh, database heavy workload. And this was kind of right at the period where you were still deploying on rotating media on hard drives. So this is like, so. An all flash buy did not make economic sense when we did this in, in 2016. This probably, it'd be interesting to know like when was the, the kind of the last time that that actual hard drives made sense? [00:04:50] Bryan: 'cause I feel this was close to it. So we had a, a bunch of, of a pathological IO problems, but we had one data center in which the outliers were actually quite a bit worse and there was so much going on in that system. It took us a long time to figure out like why. And because when, when you, when you're io when you're seeing worse io I mean you're naturally, you wanna understand like what's the workload doing? [00:05:14] Bryan: You're trying to take a first principles approach. What's the workload doing? So this is a very intensive database workload to support the, the object storage system that we had built called Manta. And that the, the metadata tier was stored and uh, was we were using Postgres for that. And that was just getting absolutely slaughtered. [00:05:34] Bryan: Um, and ultimately very IO bound with these kind of pathological IO latencies. Uh, and as we, you know, trying to like peel away the layers to figure out what was going on. And I finally had this thing. So it's like, okay, we are seeing at the, at the device layer, at the at, at the disc layer, we are seeing pathological outliers in this data center that we're not seeing anywhere else. [00:06:00] Bryan: And that does not make any sense. And the thought occurred to me. I'm like, well, maybe we are. Do we have like different. Different rev of firmware on our HGST drives, HGST. Now part of WD Western Digital were the drives that we had everywhere. And, um, so maybe we had a different, maybe I had a firmware bug. [00:06:20] Bryan: I, this would not be the first time in my life at all that I would have a drive firmware issue. Uh, and I went to go pull the firmware, rev, and I'm like, Toshiba makes hard drives? So we had, I mean. I had no idea that Toshiba even made hard drives, let alone that they were our, they were in our data center. [00:06:38] Bryan: I'm like, what is this? And as it turns out, and this is, you know, part of the, the challenge when you don't have an integrated system, which not to pick on them, but Dell doesn't, and what Dell would routinely put just sub make substitutes, and they make substitutes that they, you know, it's kind of like you're going to like, I don't know, Instacart or whatever, and they're out of the thing that you want. [00:07:03] Bryan: So, you know, you're, someone makes a substitute and like sometimes that's okay, but it's really not okay in a data center. And you really want to develop and validate a, an end-to-end integrated system. And in this case, like Toshiba doesn't, I mean, Toshiba does make hard drives, but they are a, or the data they did, uh, they basically were, uh, not competitive and they were not competitive in part for the reasons that we were discovering. [00:07:29] Bryan: They had really serious firmware issues. So the, these were drives that would just simply stop a, a stop acknowledging any reads from the order of 2,700 milliseconds. Long time, 2.7 seconds. Um. And that was a, it was a drive firmware issue, but it was highlighted like a much deeper issue, which was the simple lack of control that we had over our own destiny. [00:07:53] Bryan: Um, and it's an, it's, it's an example among many where Dell is making a decision. That lowers the cost of what they are providing you marginally, but it is then giving you a system that they shouldn't have any confidence in because it's not one that they've actually designed and they leave it to the customer, the end user, to make these discoveries. [00:08:18] Bryan: And these things happen up and down the stack. And for every, for whether it's, and, and not just to pick on Dell because it's, it's true for HPE, it's true for super micro, uh, it's true for your switch vendors. It's, it's true for storage vendors where the, the, the, the one that is left actually integrating these things and trying to make the the whole thing work is the end user sitting in their data center. AWS / Google are not buying off the shelf hardware but you can't use it [00:08:42] Bryan: There's not a product that they can buy that gives them elastic infrastructure, a cloud in their own DC The, the product that you buy is the public cloud. Like when you go in the public cloud, you don't worry about the stuff because that it's, it's AWS's issue or it's GCP's issue. And they are the ones that get this to ground. [00:09:02] Bryan: And they, and this was kind of, you know, the eye-opening moment. Not a surprise. Uh, they are not Dell customers. They're not HPE customers. They're not super micro customers. They have designed their own machines. And to varying degrees, depending on which one you're looking at. But they've taken the clean sheet of paper and the frustration that we had kind of at Joyent and beginning to wonder and then Samsung and kind of wondering what was next, uh, is that, that what they built was not available for purchase in the data center. [00:09:35] Bryan: You could only rent it in the public cloud. And our big belief is that public cloud computing is a really important revolution in infrastructure. Doesn't feel like a different, a deep thought, but cloud computing is a really important revolution. It shouldn't only be available to rent. You should be able to actually buy it. [00:09:53] Bryan: And there are a bunch of reasons for doing that. Uh, one in the one we we saw at Samsung is economics, which I think is still the dominant reason where it just does not make sense to rent all of your compute in perpetuity. But there are other reasons too. There's security, there's risk management, there's latency. [00:10:07] Bryan: There are a bunch of reasons why one might wanna to own one's own infrastructure. But, uh, that was very much the, the, so the, the genesis for oxide was coming out of this very painful experience and a painful experience that, because, I mean, a long answer to your question about like what was it like to be at Samsung scale? [00:10:27] Bryan: Those are the kinds of things that we, I mean, in our other data centers, we didn't have Toshiba drives. We only had the HDSC drives, but it's only when you get to this larger scale that you begin to see some of these pathologies. But these pathologies then are really debilitating in terms of those who are trying to develop a service on top of them. [00:10:45] Bryan: So it was, it was very educational in, in that regard. And you're very grateful for the experience at Samsung in terms of opening our eyes to the challenge of running at that kind of scale. [00:10:57] Jeremy: Yeah, because I, I think as software engineers, a lot of times we, we treat the hardware as a, as a given where, [00:11:08] Bryan: Yeah. [00:11:08] Bryan: Yeah. There's software in chard drives [00:11:09] Jeremy: It sounds like in, in this case, I mean, maybe the issue is not so much that. Dell or HP as a company doesn't own every single piece that they're providing you, but rather the fact that they're swapping pieces in and out without advertising them, and then when it becomes a problem, they're not necessarily willing to, to deal with the, the consequences of that. [00:11:34] Bryan: They just don't know. I mean, I think they just genuinely don't know. I mean, I think that they, it's not like they're making a deliberate decision to kind of ship garbage. It's just that they are making, I mean, I think it's exactly what you said about like, not thinking about the hardware. It's like, what's a hard drive? [00:11:47] Bryan: Like what's it, I mean, it's a hard drive. It's got the same specs as this other hard drive and Intel. You know, it's a little bit cheaper, so why not? It's like, well, like there's some reasons why not, and one of the reasons why not is like, uh, even a hard drive, whether it's rotating media or, or flash, like that's not just hardware. [00:12:05] Bryan: There's software in there. And that the software's like not the same. I mean, there are components where it's like, there's actually, whether, you know, if, if you're looking at like a resistor or a capacitor or something like this Yeah. If you've got two, two parts that are within the same tolerance. Yeah. [00:12:19] Bryan: Like sure. Maybe, although even the EEs I think would be, would be, uh, objecting that a little bit. But the, the, the more complicated you get, and certainly once you get to the, the, the, the kind of the hardware that we think of like a, a, a microprocessor, a a network interface card, a a, a hard driver, an NVME drive. [00:12:38] Bryan: Those things are super complicated and there's a whole bunch of software inside of those things, the firmware, and that's the stuff that, that you can't, I mean, you say that software engineers don't think about that. It's like you, no one can really think about that because it's proprietary that's kinda welded shut and you've got this abstraction into it. [00:12:55] Bryan: But the, the way that thing operates is very core to how the thing in aggregate will behave. And I think that you, the, the kind of, the, the fundamental difference between Oxide's approach and the approach that you get at a Dell HP Supermicro, wherever, is really thinking holistically in terms of hardware and software together in a system that, that ultimately delivers cloud computing to a user. [00:13:22] Bryan: And there's a lot of software at many, many, many, many different layers. And it's very important to think about, about that software and that hardware holistically as a single system. [00:13:34] Jeremy: And during that time at Joyent, when you experienced some of these issues, was it more of a case of you didn't have enough servers experiencing this? So if it would happen, you might say like, well, this one's not working, so maybe we'll just replace the hardware. What, what was the thought process when you were working at that smaller scale and, and how did these issues affect you? UEFI / Baseboard Management Controller [00:13:58] Bryan: Yeah, at the smaller scale, you, uh, you see fewer of them, right? You just see it's like, okay, we, you know, what you might see is like, that's weird. We kinda saw this in one machine versus seeing it in a hundred or a thousand or 10,000. Um, so you just, you just see them, uh, less frequently as a result, they are less debilitating. [00:14:16] Bryan: Um, I, I think that it's, when you go to that larger scale, those things that become, that were unusual now become routine and they become debilitating. Um, so it, it really is in many regards a function of scale. Uh, and then I think it was also, you know, it was a little bit dispiriting that kind of the substrate we were building on really had not improved. [00:14:39] Bryan: Um, and if you look at, you know, the, if you buy a computer server, buy an x86 server. There is a very low layer of firmware, the BIOS, the basic input output system, the UEFI BIOS, and this is like an abstraction layer that has, has existed since the eighties and hasn't really meaningfully improved. Um, the, the kind of the transition to UEFI happened with, I mean, I, I ironically with Itanium, um, you know, two decades ago. [00:15:08] Bryan: but beyond that, like this low layer, this lowest layer of platform enablement software is really only impeding the operability of the system. Um, you look at the baseboard management controller, which is the kind of the computer within the computer, there is a, uh, there is an element in the machine that needs to handle environmentals, that needs to handle, uh, operate the fans and so on. [00:15:31] Bryan: Uh, and that traditionally has this, the space board management controller, and that architecturally just hasn't improved in the last two decades. And, you know, that's, it's a proprietary piece of silicon. Generally from a company that no one's ever heard of called a Speed, uh, which has to be, is written all on caps, so I guess it needs to be screamed. [00:15:50] Bryan: Um, a speed has a proprietary part that has a, there is a root password infamously there, is there, the root password is encoded effectively in silicon. So, uh, which is just, and for, um, anyone who kind of goes deep into these things, like, oh my God, are you kidding me? Um, when we first started oxide, the wifi password was a fraction of the a speed root password for the bmc. [00:16:16] Bryan: It's kinda like a little, little BMC humor. Um, but those things, it was just dispiriting that, that the, the state-of-the-art was still basically personal computers running in the data center. Um, and that's part of what, what was the motivation for doing something new? [00:16:32] Jeremy: And for the people using these systems, whether it's the baseboard management controller or it's the The BIOS or UF UEFI component, what are the actual problems that people are seeing seen? Security vulnerabilities and poor practices in the BMC [00:16:51] Bryan: Oh man, I, the, you are going to have like some fraction of your listeners, maybe a big fraction where like, yeah, like what are the problems? That's a good question. And then you're gonna have the people that actually deal with these things who are, did like their heads already hit the desk being like, what are the problems? [00:17:06] Bryan: Like what are the non problems? Like what, what works? Actually, that's like a shorter answer. Um, I mean, there are so many problems and a lot of it is just like, I mean, there are problems just architecturally these things are just so, I mean, and you could, they're the problems spread to the horizon, so you can kind of start wherever you want. [00:17:24] Bryan: But I mean, as like, as a really concrete example. Okay, so the, the BMCs that, that the computer within the computer that needs to be on its own network. So you now have like not one network, you got two networks that, and that network, by the way, it, that's the network that you're gonna log into to like reset the machine when it's otherwise unresponsive. [00:17:44] Bryan: So that going into the BMC, you can are, you're able to control the entire machine. Well it's like, alright, so now I've got a second net network that I need to manage. What is running on the BMC? Well, it's running some. Ancient, ancient version of Linux it that you got. It's like, well how do I, how do I patch that? [00:18:02] Bryan: How do I like manage the vulnerabilities with that? Because if someone is able to root your BMC, they control the system. So it's like, this is not you've, and now you've gotta go deal with all of the operational hair around that. How do you upgrade that system updating the BMC? I mean, it's like you've got this like second shadow bad infrastructure that you have to go manage. [00:18:23] Bryan: Generally not open source. There's something called open BMC, um, which, um, you people use to varying degrees, but you're generally stuck with the proprietary BMC, so you're generally stuck with, with iLO from HPE or iDRAC from Dell or, or, uh, the, uh, su super micros, BMC, that H-P-B-M-C, and you are, uh, it is just excruciating pain. [00:18:49] Bryan: Um, and that this is assuming that by the way, that everything is behaving correctly. The, the problem is that these things often don't behave correctly, and then the consequence of them not behaving correctly. It's really dire because it's at that lowest layer of the system. So, I mean, I'll give you a concrete example. [00:19:07] Bryan: a customer of theirs reported to me, so I won't disclose the vendor, but let's just say that a well-known vendor had an issue with their, their temperature sensors were broken. Um, and the thing would always read basically the wrong value. So it was the BMC that had to like, invent its own ki a different kind of thermal control loop. [00:19:28] Bryan: And it would index on the, on the, the, the, the actual inrush current. It would, they would look at that at the current that's going into the CPU to adjust the fan speed. That's a great example of something like that's a, that's an interesting idea. That doesn't work. 'cause that's actually not the temperature. [00:19:45] Bryan: So like that software would crank the fans whenever you had an inrush of current and this customer had a workload that would spike the current and by it, when it would spike the current, the, the, the fans would kick up and then they would slowly degrade over time. Well, this workload was spiking the current faster than the fans would degrade, but not fast enough to actually heat up the part. [00:20:08] Bryan: And ultimately over a very long time, in a very painful investigation, it's customer determined that like my fans are cranked in my data center for no reason. We're blowing cold air. And it's like that, this is on the order of like a hundred watts, a server of, of energy that you shouldn't be spending and like that ultimately what that go comes down to this kind of broken software hardware interface at the lowest layer that has real meaningful consequence, uh, in terms of hundreds of kilowatts, um, across a data center. So this stuff has, has very, very, very real consequence and it's such a shadowy world. Part of the reason that, that your listeners that have dealt with this, that our heads will hit the desk is because it is really aggravating to deal with problems with this layer. [00:21:01] Bryan: You, you feel powerless. You don't control or really see the software that's on them. It's generally proprietary. You are relying on your vendor. Your vendor is telling you that like, boy, I don't know. You're the only customer seeing this. I mean, the number of times I have heard that for, and I, I have pledged that we're, we're not gonna say that at oxide because it's such an unaskable thing to say like, you're the only customer saying this. [00:21:25] Bryan: It's like, it feels like, are you blaming me for my problem? Feels like you're blaming me for my problem? Um, and what you begin to realize is that to a degree, these folks are speaking their own truth because the, the folks that are running at real scale at Hyperscale, those folks aren't Dell, HP super micro customers. [00:21:46] Bryan: They're actually, they've done their own thing. So it's like, yeah, Dell's not seeing that problem, um, because they're not running at the same scale. Um, but when you do run, you only have to run at modest scale before these things just become. Overwhelming in terms of the, the headwind that they present to people that wanna deploy infrastructure. The problem is felt with just a few racks [00:22:05] Jeremy: Yeah, so maybe to help people get some perspective at, at what point do you think that people start noticing or start feeling these problems? Because I imagine that if you're just have a few racks or [00:22:22] Bryan: do you have a couple racks or the, or do you wonder or just wondering because No, no, no. I would think, I think anyone who deploys any number of servers, especially now, especially if your experience is only in the cloud, you're gonna be like, what the hell is this? I mean, just again, just to get this thing working at all. [00:22:39] Bryan: It is so it, it's so hairy and so congealed, right? It's not designed. Um, and it, it, it, it's accreted it and it's so obviously accreted that you are, I mean, nobody who is setting up a rack of servers is gonna think to themselves like, yes, this is the right way to go do it. This all makes sense because it's, it's just not, it, I, it feels like the kit, I mean, kit car's almost too generous because it implies that there's like a set of plans to work to in the end. [00:23:08] Bryan: Uh, I mean, it, it, it's a bag of bolts. It's a bunch of parts that you're putting together. And so even at the smallest scales, that stuff is painful. Just architecturally, it's painful at the small scale then, but at least you can get it working. I think the stuff that then becomes debilitating at larger scale are the things that are, are worse than just like, I can't, like this thing is a mess to get working. [00:23:31] Bryan: It's like the, the, the fan issue that, um, where you are now seeing this over, you know, hundreds of machines or thousands of machines. Um, so I, it is painful at more or less all levels of scale. There's, there is no level at which the, the, the pc, which is really what this is, this is a, the, the personal computer architecture from the 1980s and there is really no level of scale where that's the right unit. Running elastic infrastructure is the hardware but also, hypervisor, distributed database, api, etc [00:23:57] Bryan: I mean, where that's the right thing to go deploy, especially if what you are trying to run. Is elastic infrastructure, a cloud. Because the other thing is like we, we've kinda been talking a lot about that hardware layer. Like hardware is, is just the start. Like you actually gotta go put software on that and actually run that as elastic infrastructure. [00:24:16] Bryan: So you need a hypervisor. Yes. But you need a lot more than that. You, you need to actually, you, you need a distributed database, you need web endpoints. You need, you need a CLI, you need all the stuff that you need to actually go run an actual service of compute or networking or storage. I mean, and for, for compute, even for compute, there's a ton of work to be done. [00:24:39] Bryan: And compute is by far, I would say the simplest of the, of the three. When you look at like networks, network services, storage services, there's a whole bunch of stuff that you need to go build in terms of distributed systems to actually offer that as a cloud. So it, I mean, it is painful at more or less every LE level if you are trying to deploy cloud computing on. What's a control plane? [00:25:00] Jeremy: And for someone who doesn't have experience building or working with this type of infrastructure, when you talk about a control plane, what, what does that do in the context of this system? [00:25:16] Bryan: So control plane is the thing that is, that is everything between your API request and that infrastructure actually being acted upon. So you go say, Hey, I, I want a provision, a vm. Okay, great. We've got a whole bunch of things we're gonna provision with that. We're gonna provision a vm, we're gonna get some storage that's gonna go along with that, that's got a network storage service that's gonna come out of, uh, we've got a virtual network that we're gonna either create or attach to. [00:25:39] Bryan: We've got a, a whole bunch of things we need to go do for that. For all of these things, there are metadata components that need, we need to keep track of this thing that, beyond the actual infrastructure that we create. And then we need to go actually, like act on the actual compute elements, the hostos, what have you, the switches, what have you, and actually go. [00:25:56] Bryan: Create these underlying things and then connect them. And there's of course, the challenge of just getting that working is a big challenge. Um, but getting that working robustly, getting that working is, you know, when you go to provision of vm, um, the, all the, the, the steps that need to happen and what happens if one of those steps fails along the way? [00:26:17] Bryan: What happens if, you know, one thing we're very mindful of is these kind of, you get these long tails of like, why, you know, generally our VM provisioning happened within this time, but we get these long tails where it takes much longer. What's going on? What, where in this process are we, are we actually spending time? [00:26:33] Bryan: Uh, and there's a whole lot of complexity that you need to go deal with that. There's a lot of complexity that you need to go deal with this effectively, this workflow that's gonna go create these things and manage them. Um, we use a, a pattern that we call, that are called sagas, actually is a, is a database pattern from the eighties. [00:26:51] Bryan: Uh, Katie McCaffrey is a, is a database reCrcher who, who, uh, I, I think, uh, reintroduce the idea of, of sagas, um, in the last kind of decade. Um, and this is something that we picked up, um, and I've done a lot of really interesting things with, um, to allow for, to this kind of, these workflows to be, to be managed and done so robustly in a way that you can restart them and so on. [00:27:16] Bryan: Uh, and then you guys, you get this whole distributed system that can do all this. That whole distributed system, that itself needs to be reliable and available. So if you, you know, you need to be able to, what happens if you, if you pull a sled or if a sled fails, how does the system deal with that? [00:27:33] Bryan: How does the system deal with getting an another sled added to the system? Like how do you actually grow this distributed system? And then how do you update it? How do you actually go from one version to the next? And all of that has to happen across an air gap where this is gonna run as part of the computer. [00:27:49] Bryan: So there are, it, it is fractally complicated. There, there is a lot of complexity here in, in software, in the software system and all of that. We kind of, we call the control plane. Um, and it, this is the what exists at AWS at GCP, at Azure. When you are hitting an endpoint that's provisioning an EC2 instance for you. [00:28:10] Bryan: There is an AWS control plane that is, is doing all of this and has, uh, some of these similar aspects and certainly some of these similar challenges. Are vSphere / Proxmox / Hyper-V in the same category? [00:28:20] Jeremy: And for people who have run their own servers with something like say VMware or Hyper V or Proxmox, are those in the same category? [00:28:32] Bryan: Yeah, I mean a little bit. I mean, it kind of like vSphere Yes. Via VMware. No. So it's like you, uh, VMware ESX is, is kind of a key building block upon which you can build something that is a more meaningful distributed system. When it's just like a machine that you're provisioning VMs on, it's like, okay, well that's actually, you as the human might be the control plane. [00:28:52] Bryan: Like, that's, that, that's, that's a much easier problem. Um, but when you've got, you know, tens, hundreds, thousands of machines, you need to do it robustly. You need something to coordinate that activity and you know, you need to pick which sled you land on. You need to be able to move these things. You need to be able to update that whole system. [00:29:06] Bryan: That's when you're getting into a control plane. So, you know, some of these things have kind of edged into a control plane, certainly VMware. Um, now Broadcom, um, has delivered something that's kind of cloudish. Um, I think that for folks that are truly born on the cloud, it, it still feels somewhat, uh, like you're going backwards in time when you, when you look at these kind of on-prem offerings. [00:29:29] Bryan: Um, but, but it, it, it's got these aspects to it for sure. Um, and I think that we're, um, some of these other things when you're just looking at KVM or just looks looking at Proxmox you kind of need to, to connect it to other broader things to turn it into something that really looks like manageable infrastructure. [00:29:47] Bryan: And then many of those projects are really, they're either proprietary projects, uh, proprietary products like vSphere, um, or you are really dealing with open source projects that are. Not necessarily aimed at the same level of scale. Um, you know, you look at a, again, Proxmox or, uh, um, you'll get an OpenStack. [00:30:05] Bryan: Um, and you know, OpenStack is just a lot of things, right? I mean, OpenStack has got so many, the OpenStack was kind of a, a free for all, for every infrastructure vendor. Um, and I, you know, there was a time people were like, don't you, aren't you worried about all these companies together that, you know, are coming together for OpenStack? [00:30:24] Bryan: I'm like, haven't you ever worked for like a company? Like, companies don't get along. By the way, it's like having multiple companies work together on a thing that's bad news, not good news. And I think, you know, one of the things that OpenStack has definitely struggled with, kind of with what, actually the, the, there's so many different kind of vendor elements in there that it's, it's very much not a product, it's a project that you're trying to run. [00:30:47] Bryan: But that's, but that very much is in, I mean, that's, that's similar certainly in spirit. [00:30:53] Jeremy: And so I think this is kind of like you're alluding to earlier, the piece that allows you to allocate, compute, storage, manage networking, gives you that experience of I can go to a web console or I can use an API and I can spin up machines, get them all connected. At the end of the day, the control plane. Is allowing you to do that in hopefully a user-friendly way. [00:31:21] Bryan: That's right. Yep. And in the, I mean, in order to do that in a modern way, it's not just like a user-friendly way. You really need to have a CLI and a web UI and an API. Those all need to be drawn from the same kind of single ground truth. Like you don't wanna have any of those be an afterthought for the other. [00:31:39] Bryan: You wanna have the same way of generating all of those different endpoints and, and entries into the system. Building a control plane now has better tools (Rust, CockroachDB) [00:31:46] Jeremy: And if you take your time at Joyent as an example. What kind of tools existed for that versus how much did you have to build in-house for as far as the hypervisor and managing the compute and all that? [00:32:02] Bryan: Yeah, so we built more or less everything in house. I mean, what you have is, um, and I think, you know, over time we've gotten slightly better tools. Um, I think, and, and maybe it's a little bit easier to talk about the, kind of the tools we started at Oxide because we kind of started with a, with a clean sheet of paper at oxide. [00:32:16] Bryan: We wanted to, knew we wanted to go build a control plane, but we were able to kind of go revisit some of the components. So actually, and maybe I'll, I'll talk about some of those changes. So when we, at, For example, at Joyent, when we were building a cloud at Joyent, there wasn't really a good distributed database. [00:32:34] Bryan: Um, so we were using Postgres as our database for metadata and there were a lot of challenges. And Postgres is not a distributed database. It's running. With a primary secondary architecture, and there's a bunch of issues there, many of which we discovered the hard way. Um, when we were coming to oxide, you have much better options to pick from in terms of distributed databases. [00:32:57] Bryan: You know, we, there was a period that now seems maybe potentially brief in hindsight, but of a really high quality open source distributed databases. So there were really some good ones to, to pick from. Um, we, we built on CockroachDB on CRDB. Um, so that was a really important component. That we had at oxide that we didn't have at Joyent. [00:33:19] Bryan: Um, so we were, I wouldn't say we were rolling our own distributed database, we were just using Postgres and uh, and, and dealing with an enormous amount of pain there in terms of the surround. Um, on top of that, and, and, you know, a, a control plane is much more than a database, obviously. Uh, and you've gotta deal with, uh, there's a whole bunch of software that you need to go, right. [00:33:40] Bryan: Um, to be able to, to transform these kind of API requests into something that is reliable infrastructure, right? And there, there's a lot to that. Uh, especially when networking gets in the mix, when storage gets in the mix, uh, there are a whole bunch of like complicated steps that need to be done, um, at Joyent. [00:33:59] Bryan: Um, we, in part because of the history of the company and like, look. This, this just is not gonna sound good, but it just is what it is and I'm just gonna own it. We did it all in Node, um, at Joyent, which I, I, I know it sounds really right now, just sounds like, well, you, you built it with Tinker Toys. You Okay. [00:34:18] Bryan: Uh, did, did you think it was, you built the skyscraper with Tinker Toys? Uh, it's like, well, okay. We actually, we had greater aspirations for the Tinker Toys once upon a time, and it was better than, you know, than Twisted Python and Event Machine from Ruby, and we weren't gonna do it in Java. All right. [00:34:32] Bryan: So, but let's just say that that experiment, uh, that experiment did ultimately end in a predictable fashion. Um, and, uh, we, we decided that maybe Node was not gonna be the best decision long term. Um, Joyent was the company behind node js. Uh, back in the day, Ryan Dahl worked for Joyent. Uh, and then, uh, then we, we, we. [00:34:53] Bryan: Uh, landed that in a foundation in about, uh, what, 2015, something like that. Um, and began to consider our world beyond, uh, beyond Node. Rust at Oxide [00:35:04] Bryan: A big tool that we had in the arsenal when we started Oxide is Rust. Um, and so indeed the name of the company is, is a tip of the hat to the language that we were pretty sure we were gonna be building a lot of stuff in. [00:35:16] Bryan: Namely Rust. And, uh, rust is, uh, has been huge for us, a very important revolution in programming languages. you know, there, there, there have been different people kind of coming in at different times and I kinda came to Rust in what I, I think is like this big kind of second expansion of rust in 2018 when a lot of technologists were think, uh, sick of Node and also sick of Go. [00:35:43] Bryan: And, uh, also sick of C++. And wondering is there gonna be something that gives me the, the, the performance, of that I get outta C. The, the robustness that I can get out of a C program but is is often difficult to achieve. but can I get that with kind of some, some of the velocity of development, although I hate that term, some of the speed of development that you get out of a more interpreted language. [00:36:08] Bryan: Um, and then by the way, can I actually have types, I think types would be a good idea? Uh, and rust obviously hits the sweet spot of all of that. Um, it has been absolutely huge for us. I mean, we knew when we started the company again, oxide, uh, we were gonna be using rust in, in quite a, quite a. Few places, but we weren't doing it by fiat. [00:36:27] Bryan: Um, we wanted to actually make sure we're making the right decision, um, at, at every different, at every layer. Uh, I think what has been surprising is the sheer number of layers at which we use rust in terms of, we've done our own embedded firmware in rust. We've done, um, in, in the host operating system, which is still largely in C, but very big components are in rust. [00:36:47] Bryan: The hypervisor Propolis is all in rust. Uh, and then of course the control plane, that distributed system on that is all in rust. So that was a very important thing that we very much did not need to build ourselves. We were able to really leverage, uh, a terrific community. Um. We were able to use, uh, and we've done this at Joyent as well, but at Oxide, we've used Illumos as a hostos component, which, uh, our variant is called Helios. [00:37:11] Bryan: Um, we've used, uh, bhyve um, as a, as as that kind of internal hypervisor component. we've made use of a bunch of different open source components to build this thing, um, which has been really, really important for us. Uh, and open source components that didn't exist even like five years prior. [00:37:28] Bryan: That's part of why we felt that 2019 was the right time to start the company. And so we started Oxide. The problems building a control plane in Node [00:37:34] Jeremy: You had mentioned that at Joyent, you had tried to build this in, in Node. What were the, what were the, the issues or the, the challenges that you had doing that? [00:37:46] Bryan: Oh boy. Yeah. again, we, I kind of had higher hopes in 2010, I would say. When we, we set on this, um, the, the, the problem that we had just writ large, um. JavaScript is really designed to allow as many people on earth to write a program as possible, which is good. I mean, I, I, that's a, that's a laudable goal. [00:38:09] Bryan: That is the goal ultimately of such as it is of JavaScript. It's actually hard to know what the goal of JavaScript is, unfortunately, because Brendan Ike never actually wrote a book. so that there is not a canonical, you've got kind of Doug Crockford and other people who've written things on JavaScript, but it's hard to know kind of what the original intent of JavaScript is. [00:38:27] Bryan: The name doesn't even express original intent, right? It was called Live Script, and it was kind of renamed to JavaScript during the Java Frenzy of the late nineties. A name that makes no sense. There is no Java in JavaScript. that is kind of, I think, revealing to kind of the, uh, the unprincipled mess that is JavaScript. [00:38:47] Bryan: It, it, it's very pragmatic at some level, um, and allows anyone to, it makes it very easy to write software. The problem is it's much more difficult to write really rigorous software. So, uh, and this is what I should differentiate JavaScript from TypeScript. This is really what TypeScript is trying to solve. [00:39:07] Bryan: TypeScript is like. How can, I think TypeScript is a, is a great step forward because TypeScript is like, how can we bring some rigor to this? Like, yes, it's great that it's easy to write JavaScript, but that's not, we, we don't wanna do that for Absolutely. I mean that, that's not the only problem we solve. [00:39:23] Bryan: We actually wanna be able to write rigorous software and it's actually okay if it's a little harder to write rigorous software that's actually okay if it gets leads to, to more rigorous artifacts. Um, but in JavaScript, I mean, just a concrete example. You know, there's nothing to prevent you from referencing a property that doesn't actually exist in JavaScript. [00:39:43] Bryan: So if you fat finger a property name, you are relying on something to tell you. By the way, I think you've misspelled this because there is no type definition for this thing. And I don't know that you've got one that's spelled correctly, one that's spelled incorrectly, that's often undefined. And then the, when you actually go, you say you've got this typo that is lurking in your what you want to be rigorous software. [00:40:07] Bryan: And if you don't execute that code, like you won't know that's there. And then you do execute that code. And now you've got a, you've got an undefined object. And now that's either gonna be an exception or it can, again, depends on how that's handled. It can be really difficult to determine the origin of that, of, of that error, of that programming. [00:40:26] Bryan: And that is a programmer error. And one of the big challenges that we had with Node is that programmer errors and operational errors, like, you know, I'm out of disk space as an operational error. Those get conflated and it becomes really hard. And in fact, I think the, the language wanted to make it easier to just kind of, uh, drive on in the event of all errors. [00:40:53] Bryan: And it's like, actually not what you wanna do if you're trying to build a reliable, robust system. So we had. No end of issues. [00:41:01] Bryan: We've got a lot of experience developing rigorous systems, um, again coming out of operating systems development and so on. And we want, we brought some of that rigor, if strangely, to JavaScript. So one of the things that we did is we brought a lot of postmortem, diagnos ability and observability to node. [00:41:18] Bryan: And so if, if one of our node processes. Died in production, we would actually get a core dump from that process, a core dump that we could actually meaningfully process. So we did a bunch of kind of wild stuff. I mean, actually wild stuff where we could actually make sense of the JavaScript objects in a binary core dump. JavaScript values ease of getting started over robustness [00:41:41] Bryan: Um, and things that we thought were really important, and this is the, the rest of the world just looks at this being like, what the hell is this? I mean, it's so out of step with it. The problem is that we were trying to bridge two disconnected cultures of one developing really. Rigorous software and really designing it for production, diagnosability and the other, really designing it to software to run in the browser and for anyone to be able to like, you know, kind of liven up a webpage, right? [00:42:10] Bryan: Is kinda the origin of, of live script and then JavaScript. And we were kind of the only ones sitting at the intersection of that. And you begin when you are the only ones sitting at that kind of intersection. You just are, you're, you're kind of fighting a community all the time. And we just realized that we are, there were so many things that the community wanted to do that we felt are like, no, no, this is gonna make software less diagnosable. It's gonna make it less robust. The NodeJS split and why people left [00:42:36] Bryan: And then you realize like, I'm, we're the only voice in the room because we have got, we have got desires for this language that it doesn't have for itself. And this is when you realize you're in a bad relationship with software. It's time to actually move on. And in fact, actually several years after, we'd already kind of broken up with node. [00:42:55] Bryan: Um, and it was like, it was a bit of an acrimonious breakup. there was a, uh, famous slash infamous fork of node called IoJS Um, and this was viewed because people, the community, thought that Joyent was being what was not being an appropriate steward of node js and was, uh, not allowing more things to come into to, to node. [00:43:19] Bryan: And of course, the reason that we of course, felt that we were being a careful steward and we were actively resisting those things that would cut against its fitness for a production system. But it's some way the community saw it and they, and forked, um, and, and I think the, we knew before the fork that's like, this is not working and we need to get this thing out of our hands. Platform is a reflection of values node summit talk [00:43:43] Bryan: And we're are the wrong hands for this? This needs to be in a foundation. Uh, and so we kind of gone through that breakup, uh, and maybe it was two years after that. That, uh, friend of mine who was um, was running the, uh, the node summit was actually, it's unfortunately now passed away. Charles er, um, but Charles' venture capitalist great guy, and Charles was running Node Summit and came to me in 2017. [00:44:07] Bryan: He is like, I really want you to keynote Node Summit. And I'm like, Charles, I'm not gonna do that. I've got nothing nice to say. Like, this is the, the, you don't want, I'm the last person you wanna keynote. He's like, oh, if you have nothing nice to say, you should definitely keynote. You're like, oh God, okay, here we go. [00:44:22] Bryan: He's like, no, I really want you to talk about, like, you should talk about the Joyent breakup with NodeJS. I'm like, oh man. [00:44:29] Bryan: And that led to a talk that I'm really happy that I gave, 'cause it was a very important talk for me personally. Uh, called Platform is a reflection of values and really looking at the values that we had for Node and the values that Node had for itself. And they didn't line up. [00:44:49] Bryan: And the problem is that the values that Node had for itself and the values that we had for Node are all kind of positives, right? Like there's nobody in the node community who's like, I don't want rigor, I hate rigor. It's just that if they had the choose between rigor and making the language approachable. [00:45:09] Bryan: They would choose approachability every single time. They would never choose rigor. And, you know, that was a, that was a big eye-opener. I do, I would say, if you watch this talk. [00:45:20] Bryan: because I knew that there's, like, the audience was gonna be filled with, with people who, had been a part of the fork in 2014, I think was the, the, the, the fork, the IOJS fork. And I knew that there, there were, there were some, you know, some people that were, um, had been there for the fork and. [00:45:41] Bryan: I said a little bit of a trap for the audience. But the, and the trap, I said, you know what, I, I kind of talked about the values that we had and the aspirations we had for Node, the aspirations that Node had for itself and how they were different. [00:45:53] Bryan: And, you know, and I'm like, look in, in, in hindsight, like a fracture was inevitable. And in 2014 there was finally a fracture. And do people know what happened in 2014? And if you, if you, you could listen to that talk, everyone almost says in unison, like IOJS. I'm like, oh right. IOJS. Right. That's actually not what I was thinking of. [00:46:19] Bryan: And I go to the next slide and is a tweet from a guy named TJ Holloway, Chuck, who was the most prolific contributor to Node. And it was his tweet also in 2014 before the fork, before the IOJS fork explaining that he was leaving Node and that he was going to go. And you, if you turn the volume all the way up, you can hear the audience gasp. [00:46:41] Bryan: And it's just delicious because the community had never really come, had never really confronted why TJ left. Um, there. And I went through a couple folks, Felix, bunch of other folks, early Node folks. That were there in 2010, were leaving in 2014, and they were going to go primarily, and they were going to go because they were sick of the same things that we were sick of. [00:47:09] Bryan: They, they, they had hit the same things that we had hit and they were frustrated. I I really do believe this, that platforms do reflect their own values. And when you are making a software decision, you are selecting value. [00:47:26] Bryan: You should select values that align with the values that you have for that software. That is, those are, that's way more important than other things that people look at. I think people look at, for example, quote unquote community size way too frequently, community size is like. Eh, maybe it can be fine. [00:47:44] Bryan: I've been in very large communities, node. I've been in super small open source communities like AUMs and RAs, a bunch of others. there are strengths and weaknesses to both approaches just as like there's a strength to being in a big city versus a small town. Me personally, I'll take the small community more or less every time because the small community is almost always self-selecting based on values and just for the same reason that I like working at small companies or small teams. [00:48:11] Bryan: There's a lot of value to be had in a small community. It's not to say that large communities are valueless, but again, long answer to your question of kind of where did things go south with Joyent and node. They went south because the, the values that we had and the values the community had didn't line up and that was a very educational experience, as you might imagine. [00:48:33] Jeremy: Yeah. And, and given that you mentioned how, because of those values, some people moved from Node to go, and in the end for much of what oxide is building. You ended up using rust. What, what would you say are the, the values of go and and rust, and how did you end up choosing Rust given that. Go's decisions regarding generics, versioning, compilation speed priority [00:48:56] Bryan: Yeah, I mean, well, so the value for, yeah. And so go, I mean, I understand why people move from Node to Go, go to me was kind of a lateral move. Um, there were a bunch of things that I, uh, go was still garbage collected, um, which I didn't like. Um, go also is very strange in terms of there are these kind of like. [00:49:17] Bryan: These autocratic kind of decisions that are very bizarre. Um, there, I mean, generics is kind of a famous one, right? Where go kind of as a point of principle didn't have generics, even though go itself actually the innards of go did have generics. It's just that you a go user weren't allowed to have them. [00:49:35] Bryan: And you know, it's kind of, there was, there was an old cartoon years and years ago about like when a, when a technologist is telling you that something is technically impossible, that actually means I don't feel like it. Uh, and there was a certain degree of like, generics are technically impossible and go, it's like, Hey, actually there are. [00:49:51] Bryan: And so there was, and I just think that the arguments against generics were kind of disingenuous. Um, and indeed, like they ended up adopting generics and then there's like some super weird stuff around like, they're very anti-assertion, which is like, what, how are you? Why are you, how is someone against assertions, it doesn't even make any sense, but it's like, oh, nope. [00:50:10] Bryan: Okay. There's a whole scree on it. Nope, we're against assertions and the, you know, against versioning. There was another thing like, you know, the Rob Pike has kind of famously been like, you should always just run on the way to commit. And you're like, does that, is that, does that make sense? I mean this, we actually built it. [00:50:26] Bryan: And so there are a bunch of things like that. You're just like, okay, this is just exhausting and. I mean, there's some things about Go that are great and, uh, plenty of other things that I just, I'm not a fan of. Um, I think that the, in the end, like Go cares a lot about like compile time. It's super important for Go Right? [00:50:44] Bryan: Is very quick, compile time. I'm like, okay. But that's like compile time is not like, it's not unimportant, it's doesn't have zero importance. But I've got other things that are like lots more important than that. Um, what I really care about is I want a high performing artifact. I wanted garbage collection outta my life. Don't think garbage collection has good trade offs [00:51:00] Bryan: I, I gotta tell you, I, I like garbage collection to me is an embodiment of this like, larger problem of where do you put cognitive load in the software development process. And what garbage collection is saying to me it is right for plenty of other people and the software that they wanna develop. [00:51:21] Bryan: But for me and the software that I wanna develop, infrastructure software, I don't want garbage collection because I can solve the memory allocation problem. I know when I'm like, done with something or not. I mean, it's like I, whether that's in, in C with, I mean it's actually like, it's really not that hard to not leak memory in, in a C base system. [00:51:44] Bryan: And you can. give yourself a lot of tooling that allows you to diagnose where memory leaks are coming from. So it's like that is a solvable problem. There are other challenges with that, but like, when you are developing a really sophisticated system that has garbage collection is using garbage collection. [00:51:59] Bryan: You spend as much time trying to dork with the garbage collector to convince it to collect the thing that you know is garbage. You are like, I've got this thing. I know it's garbage. Now I need to use these like tips and tricks to get the garbage collector. I mean, it's like, it feels like every Java performance issue goes to like minus xx call and use the other garbage collector, whatever one you're using, use a different one and using a different, a different approach. [00:52:23] Bryan: It's like, so you're, you're in this, to me, it's like you're in the worst of all worlds where. the reason that garbage collection is helpful is because the programmer doesn't have to think at all about this problem. But now you're actually dealing with these long pauses in production. [00:52:38] Bryan: You're dealing with all these other issues where actually you need to think a lot about it. And it's kind of, it, it it's witchcraft. It, it, it's this black box that you can't see into. So it's like, what problem have we solved exactly? And I mean, so the fact that go had garbage collection, it's like, eh, no, I, I do not want, like, and then you get all the other like weird fatwahs and you know, everything else. [00:52:57] Bryan: I'm like, no, thank you. Go is a no thank you for me, I, I get it why people like it or use it, but it's, it's just, that was not gonna be it. Choosing Rust [00:53:04] Bryan: I'm like, I want C. but I, there are things I didn't like about C too. I was looking for something that was gonna give me the deterministic kind of artifact that I got outta C. But I wanted library support and C is tough because there's, it's all convention. you know, there's just a bunch of other things that are just thorny. And I remember thinking vividly in 2018, I'm like, well, it's rust or bust. Ownership model, algebraic types, error handling [00:53:28] Bryan: I'm gonna go into rust. And, uh, I hope I like it because if it's not this, it's gonna like, I'm gonna go back to C I'm like literally trying to figure out what the language is for the back half of my career. Um, and when I, you know, did what a lot of people were doing at that time and people have been doing since of, you know, really getting into rust and really learning it, appreciating the difference in the, the model for sure, the ownership model people talk about. [00:53:54] Bryan: That's also obviously very important. It was the error handling that blew me away. And the idea of like algebraic types, I never really had algebraic types. Um, and the ability to, to have. And for error handling is one of these really, uh, you, you really appreciate these things where it's like, how do you deal with a, with a function that can either succeed and return something or it can fail, and the way c deals with that is bad with these kind of sentinels for errors. [00:54:27] Bryan: And, you know, does negative one mean success? Does negative one mean failure? Does zero mean failure? Some C functions, zero means failure. Traditionally in Unix, zero means success. And like, what if you wanna return a file descriptor, you know, it's like, oh. And then it's like, okay, then it'll be like zero through positive N will be a valid result. [00:54:44] Bryan: Negative numbers will be, and like, was it negative one and I said airo, or is it a negative number that did not, I mean, it's like, and that's all convention, right? People do all, all those different things and it's all convention and it's easy to get wrong, easy to have bugs, can't be statically checked and so on. Um, and then what Go says is like, well, you're gonna have like two return values and then you're gonna have to like, just like constantly check all of these all the time. Um, which is also kind of gross. Um, JavaScript is like, Hey, let's toss an exception. If, if we don't like something, if we see an error, we'll, we'll throw an exception. [00:55:15] Bryan: There are a bunch of reasons I don't like that. Um, and you look, you'll get what Rust does, where it's like, no, no, no. We're gonna have these algebra types, which is to say this thing can be a this thing or that thing, but it, but it has to be one of these. And by the way, you don't get to process this thing until you conditionally match on one of these things. [00:55:35] Bryan: You're gonna have to have a, a pattern match on this thing to determine if it's a this or a that, and if it in, in the result type that you, the result is a generic where it's like, it's gonna be either the thing that you wanna return. It's gonna be an okay that contains the thing you wanna return, or it's gonna be an error that contains your error and it forces your code to deal with that. [00:55:57] Bryan: And what that does is it shifts the cognitive load from the person that is operating this thing in production to the, the actual developer that is in development. And I think that that, that to me is like, I, I love that shift. Um, and that shift to me is really important. Um, and that's what I was missing, that that's what Rust gives you. [00:56:23] Bryan: Rust forces you to think about your code as you write it, but as a result, you have an artifact that is much more supportable, much more sustainable, and much faster. Prefer to frontload cognitive load during development instead of at runtime [00:56:34] Jeremy: Yeah, it sounds like you would rather take the time during the development to think about these issues because whether it's garbage collection or it's error handling at runtime when you're trying to solve a problem, then it's much more difficult than having dealt with it to start with. [00:56:57] Bryan: Yeah, absolutely. I, and I just think that like, why also, like if it's software, if it's, again, if it's infrastructure software, I mean the kinda the question that you, you should have when you're writing software is how long is this software gonna live? How many people are gonna use this software? Uh, and if you are writing an operating system, the answer for this thing that you're gonna write, it's gonna live for a long time. [00:57:18] Bryan: Like, if we just look at plenty of aspects of the system that have been around for a, for decades, it's gonna live for a long time and many, many, many people are gonna use it. Why would we not expect people writing that software to have more cognitive load when they're writing it to give us something that's gonna be a better artifact? [00:57:38] Bryan: Now conversely, you're like, Hey, I kind of don't care about this. And like, I don't know, I'm just like, I wanna see if this whole thing works. I've got, I like, I'm just stringing this together. I don't like, no, the software like will be lucky if it survives until tonight, but then like, who cares? Yeah. Yeah. [00:57:52] Bryan: Gar garbage clock. You know, if you're prototyping something, whatever. And this is why you really do get like, you know, different choices, different technology choices, depending on the way that you wanna solve the problem at hand. And for the software that I wanna write, I do like that cognitive load that is upfront. With LLMs maybe you can get the benefit of the robust artifact with less cognitive load [00:58:10] Bryan: Um, and although I think, I think the thing that is really wild that is the twist that I don't think anyone really saw coming is that in a, in an LLM age. That like the cognitive load upfront almost needs an asterisk on it because so much of that can be assisted by an LLM. And now, I mean, I would like to believe, and maybe this is me being optimistic, that the the, in the LLM age, we will see, I mean, rust is a great fit for the LLMH because the LLM itself can get a lot of feedback about whether the software that's written is correct or not. [00:58:44] Bryan: Much more so than you can for other environments. [00:58:48] Jeremy: Yeah, that is a interesting point in that I think when people first started trying out the LLMs to code, it was really good at these maybe looser languages like Python or JavaScript, and initially wasn't so good at something like Rust. But it sounds like as that improves, if. It can write it then because of the rigor or the memory management or the error handling that the language is forcing you to do, it might actually end up being a better choice for people using LLMs. [00:59:27] Bryan: absolutely. I, it, it gives you more certainty in the artifact that you've delivered. I mean, you know a lot about a Rust program that compiles correctly. I mean, th there are certain classes of errors that you don't have, um, that you actually don't know on a C program or a GO program or a, a JavaScript program. [00:59:46] Bryan: I think that's gonna be really important. I think we are on the cusp. Maybe we've already seen it, this kind of great bifurcation in the software that we writ
professorjrod@gmail.comAre you preparing for the CompTIA exam or looking to boost your IT skills development? This episode dives deep into Windows troubleshooting with a focus on network diagnostics — a crucial topic for any tech exam prep. We guide you through validating a Windows machine's network identity using IPConfig, performing a strict ping sequence to verify communication scope, and utilizing NSLookup to troubleshoot DNS issues. Following this disciplined order ensures clarity and efficiency, making every fix both defensible and effective. Whether you're studying solo or in a study group, this step-by-step approach to Windows networking will enhance your technology education and help you succeed in your IT certification journey.We dig into why a 169.254 APIPA address narrows the culprit to DHCP or network infrastructure, not the NIC or OS. Then we connect the dots between ports and services using Netstat, making it clear when a service is misconfigured rather than the network being “down.” From web ports 80 and 443 to SMB 445 and RDP 3389, you'll see how listening states reveal the true problem fast.Powerful remote access demands restraint. We break down when RDP makes sense, why Network Level Authentication should be non-negotiable, and how consent-based Remote Assist reduces risk when users need to stay in control. For scale, we highlight WinRM over HTTPS and SSH as secure, script-friendly options that keep credentials protected and GUIs out of the attack surface.Performance complaints need evidence, not guesswork. We show how Task Manager, Resource Monitor, Performance Monitor, and Event Viewer combine to reveal bottlenecks, crashes, and policy blocks. When things get critical—no boot, blue screens—we map BIOS vs UEFI realities, then use WinRE tools in the safest order to recover without data loss. By the end, you'll have a repeatable framework: identity, routing, names, services, performance, platform, recovery. Subscribe, share with a teammate who still starts with the browser, and tell us: what's your first command when “nothing works”?Support the showArt By Sarah/DesmondMusic by Joakim KarudLittle chacha ProductionsJuan Rodriguez can be reached atTikTok @ProfessorJrodProfessorJRod@gmail.com@Prof_JRodInstagram ProfessorJRod
「ソフトバンク大規模接続障害の再来か?今年注意したい「PC機能不全問題」の危機」 PC機能不全問題とは? PCの電源を入れた後、OSより先に起動するのが、ウイルスの侵入を阻止するセキュリティー機能「セキュアブート」などを備えたプログラムのUEFIだ。現在PCユーザーが不安視する「PC機能不全問題」は、2011年発行マイクロソフトのUEFI証明書が、26年6月に期限切れを迎えることが発端。有効期限切れの証明書でWindowsのPCを起動すると、NVIDIA社が手がけた一部のグラフィックボードが、セキュアブートにブロックされ、起動障害を起こす場合があるという。PCに詳しいライターの平澤寿康さんに機能不全の可能性と対処法を聞いた。「マイクロソフトは23年に発行された新しい証明書をすでに配布しています。『Windows 11』のユーザーであれば、定期的に『Windows Update』を実行していれば、自動で新しい証明書が適用されていますから、気にする必要はありません。グラフィックボードなどの証明書も更新されるので、起動に影響なく正常にPCを使い続けることができます」 2018年、交換機のソフトウエア証明書の期限切れで起こった、ソフトバンク大規模通信障害の二の舞いは回避できそうだ。UEFIにはPC起動時の安全性を担保するガードマンの役割もUEFIで最も重要な機能がセキュアブートだ。正規の署名を持っているかをチェックし、マルウエア侵入のブロックを行なう、いわばPCのガードマン的役割を担う。対処法はプログラムの更新Windows Updateはデフォルトで自動更新が有効になっているので手動操作は不要。UEFIが有効になっているかどうかは、「システム情報」を開くと確認できる。【DIMEの読み】「UEFI証明書よりも、サポートが終了し、セキュリティー更新プログラムが提供されないWindows 10を使いつづける方が大問題」と平澤さん。ウイルス感染によるPC機能不全にはご注意を。取材・文/安藤政弘 編集/渡辺和博
Tony: -Things heating up with AI usage in Video Games: https://www.polygon.com/clair-obscur-expedition-33-indie-game-awards-controvsery-gen-ai-explained/?link_source=ta_bluesky_link&taid=6949aa663265bb00019eb313&utm_campaign=trueanthem&utm_medium=social&utm_source=bluesky Larian got in trouble too: https://wccftech.com/larian-studios-will-refrain-from-using-genai-tools-to-develop-concept-art-will-use-it-elsewhere/ -The Division Definitive Edition: https://www.ign.com/articles/it-looks-like-the-division-definitive-edition-is-on-the-way-10-years-after-its-initial-release -What is Gamestop even doing?... https://arstechnica.com/gaming/2026/01/is-this-the-beginning-of-the-end-for-gamestop/ -Arc Raiders Aggression Based Match Making: https://www.ign.com/articles/this-is-like-a-whole-new-world-arc-raiders-players-are-using-aggression-based-matchmaking-to-retire-from-pvp Jarron: -DLSS 4.5 https://www.digitalfoundry.net/news/2026/01/nvidia-announces-dlss-4-5-new-transformer-model-already-live -New Strix Halo chips for gaming: https://www.theverge.com/tech/855463/amd-strix-halo-ai-max-plus-388-392-handheld-gaming -Intel is going to release a custom Panther Lake CPU for handhelds: https://www.theverge.com/tech/857252/intel-handheld-gaming-pc-panther-lake-custom-cpu -Gamesir made the weirdest controller that I kinda want: https://www.theverge.com/news/851259/gamesir-switch-turbo-drive-controller-steering-wheel-wireless-force-feedback -Star Wars can jailbreak a PS5: https://www.engadget.com/gaming/playstation/prices-for-an-old-star-wars-game-have-ballooned-because-of-its-role-in-a-ps5-jailbreak-230604276.html?src=rss Owen: -Here's a game for you. UEFI boot game. https://www.tomshardware.com/software/dev-creates-uefi-games-compendium-where-you-have-to-win-to-access-your-computer-10-month-project-will-shutdown-your-pc-if-you-lose -Brandon Sanderson AAA Game? Well it's in the works. https://wccftech.com/mistborn-author-is-talking-to-aaa-game-studios-adapt-fantasy-saga/ Lando: -Pay Attention! Free games! https://kotaku.com/steam-sale-team-17-hidden-codes-free-games-deal-secret-artwork-2000658974
Happy New Year! In this episode, Automox cybersecurity experts Ryan Braunstein and Seth Hoyt break down the security vulnerabilities you need to know heading into 2026.First up: a ticking time bomb. Microsoft's 2011 Secure Boot certificates expire in June and October 2026, making this your top patching priority for the year. If your BIOS and OS aren't both updated, you're leaving the door wide open for rootkit attacks. Start auditing your hardware now. You have six months.Next up: a Windows Installer Elevation of Privilege Vulnerability that exploits a time-of-check to time-of-use (TOCTOU) race condition. Think of it like swapping wristbands after the bouncer checks you at the door.Finally, an actively exploited flaw in Desktop Window Manager that can leak sensitive information and even break out of sandboxes.Patch your systems. Patch your BIOS. See you next month.
If you like what you hear, please subscribe, leave us a review and tell a friend!
professorjrod@gmail.comA census solved with cardboard, a company remade by a $5 billion gamble, and a tiny firmware layer that cracked open the PC market—this is the human story behind how computing became a platform, not a product. We go from Hermann Hollerith's 1890 insight to IBM's sales-first system that taught the world to think in fields and records, and then to the cultural and ethical crosscurrents that come with scale. Those punched holes didn't just count people; they trained generations to quantify work, plan logistics, and make decisions with data.The narrative turns at a crossroads in the early 1960s. Thomas J. Watson Jr. sees a maze of incompatible machines and bets the company on a single, compatible architecture: System/360. It demanded new chips, code, factories, and nerve. Launch day lands with shock and relief—orders flood in for a family of computers that finally speak the same language. That choice redefined the industry's economics: software could live longer than hardware, upgrades didn't mean rewrites, and customers stopped fearing growth. Architecture became destiny, and IBM set the standard that everyone from Apple to ARM would later emulate in their own ecosystems.Then the stage shifts again to 1981, where a humble BIOS turns one machine into a platform. IBM documented how its firmware behaved; Compaq legally reimplemented it; the clone market ignited. Prices dropped, innovation surged, and the Wintel era took shape. IBM lost tight control but the world gained a common PC standard that carried software across brands and borders. From punch card schemas to UEFI, from batch jobs to cloud migrations, the same lesson repeats: design for compatibility, bet on continuity, and accept that openness can multiply impact.If the story made you think differently about the architecture beneath your apps and devices, follow the show, share it with a friend, and leave a review to help others find Technology Tap. What bold standard—or act of openness—should today's tech leaders champion next?Inspiring Tech Leaders - The Technology PodcastInterviews with Tech Leaders and insights on the latest emerging technology trends.Listen on: Apple Podcasts SpotifySupport the showArt By Sarah/DesmondMusic by Joakim KarudLittle chacha ProductionsJuan Rodriguez can be reached atTikTok @ProfessorJrodProfessorJRod@gmail.com@Prof_JRodInstagram ProfessorJRod
Erotic Chats, UEFI, F5, Cisco, Doug Sings, Insiders, Lastpass, Sora, Aaran Leyland, and More on this edition of the Security Weekly News. Visit https://www.securityweekly.com/swn for all the latest episodes! Show Notes: https://securityweekly.com/swn-521
Erotic Chats, UEFI, F5, Cisco, Doug Sings, Insiders, Lastpass, Sora, Aaran Leyland, and More on this edition of the Security Weekly News. Show Notes: https://securityweekly.com/swn-521
Erotic Chats, UEFI, F5, Cisco, Doug Sings, Insiders, Lastpass, Sora, Aaran Leyland, and More on this edition of the Security Weekly News. Visit https://www.securityweekly.com/swn for all the latest episodes! Show Notes: https://securityweekly.com/swn-521
Erotic Chats, UEFI, F5, Cisco, Doug Sings, Insiders, Lastpass, Sora, Aaran Leyland, and More on this edition of the Security Weekly News. Show Notes: https://securityweekly.com/swn-521
First up is a technical segment on UEFI shells: determining if they contain dangerous functionality that allows attackers to bypass Secure Boot. Then in the security news: Your vulnerability scanner is your weakest link Scams that almost got me The state of EDR is not good You don't need to do that on a phone or Raspberry PI Hash cracking and exploits Revisiting LG WebOS Hardening Docker images Hacking Moxa NPort Shoddy academic research The original sin of computing Bodycam hacking A new OS for ESP32 The AI bubble is going to burt Mobile VPNs are not always secure Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-896
First up is a technical segment on UEFI shells: determining if they contain dangerous functionality that allows attackers to bypass Secure Boot. Then in the security news: Your vulnerability scanner is your weakest link Scams that almost got me The state of EDR is not good You don't need to do that on a phone or Raspberry PI Hash cracking and exploits Revisiting LG WebOS Hardening Docker images Hacking Moxa NPort Shoddy academic research The original sin of computing Bodycam hacking A new OS for ESP32 The AI bubble is going to burt Mobile VPNs are not always secure Show Notes: https://securityweekly.com/psw-896
First up is a technical segment on UEFI shells: determining if they contain dangerous functionality that allows attackers to bypass Secure Boot. Then in the security news: Your vulnerability scanner is your weakest link Scams that almost got me The state of EDR is not good You don't need to do that on a phone or Raspberry PI Hash cracking and exploits Revisiting LG WebOS Hardening Docker images Hacking Moxa NPort Shoddy academic research The original sin of computing Bodycam hacking A new OS for ESP32 The AI bubble is going to burt Mobile VPNs are not always secure Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-896
First up is a technical segment on UEFI shells: determining if they contain dangerous functionality that allows attackers to bypass Secure Boot. Then in the security news: Your vulnerability scanner is your weakest link Scams that almost got me The state of EDR is not good You don't need to do that on a phone or Raspberry PI Hash cracking and exploits Revisiting LG WebOS Hardening Docker images Hacking Moxa NPort Shoddy academic research The original sin of computing Bodycam hacking A new OS for ESP32 The AI bubble is going to burt Mobile VPNs are not always secure Show Notes: https://securityweekly.com/psw-896
professorjrod@gmail.comA dead PC at the worst moment is a gut punch—unless you have a roadmap. We walk through the exact thinking that turns “no lights, no fans, no display” into a calm, step‑by‑step recovery, starting where every system truly begins: firmware. BIOS and UEFI aren't trivia; they decide how your machine discovers drives, validates bootloaders, and applies security like Secure Boot and TPM. That's why a simple post‑update check of boot order, storage mode, and firmware toggles can rescue a lab full of “no boot device” errors in minutes.From there, we get brutally honest about power. PSUs age, rails sag, and idle tests lie. You'll learn the outside‑in “power ladder,” why a line‑interactive UPS prevents ghost errors, and how unstable XMP profiles masquerade as OS problems. We demystify boot and drive failures—wrong boot entries, NVMe lane conflicts, cloning driver mismatches—and show how SMART data, free space, cooling, and firmware updates revive sluggish SSDs. Then we cut through RAID mythology: 0 for speed, 1 for uptime, 5 for read‑heavy with risk, 6 for double‑parity safety, and 10 for fast resilience. And we repeat the rule that saves careers: RAID is not backup. Verify restores, keep copies offsite or offline, and schedule tests before disaster strikes.Video issues get the practical treatment too. No display? Check inputs and connect to the discrete GPU, not the motherboard. Blurry or artifacting under load? Validate refresh rates, cables, thermals, and PSU capacity. We close with a field checklist and a case study where a quality PSU upgrade stabilized 3D renders instantly—proof that systems thinking beats screen-chasing every time. If you want a technician's mindset—evidence over assumptions, one variable at a time—this guide will sharpen your process and speed your fixes.If this helped you think like a tech, follow the show, share it with a teammate who's on call this week, and leave a quick review so more builders and troubleshooters can find it.Support the showIf you want to help me with my research please e-mail me.Professorjrod@gmail.comIf you want to join my question/answer zoom class e-mail me at Professorjrod@gmail.comArt By Sarah/DesmondMusic by Joakim KarudLittle chacha ProductionsJuan Rodriguez can be reached atTikTok @ProfessorJrodProfessorJRod@gmail.com@Prof_JRodInstagram ProfessorJRod
Can't get enough Linux? How about multiple kernels running simultaneously, side by side, not in a VM, all on the same hardware; this week it's finally looking real.Sponsored By:Managed Nebula: Meet Managed Nebula from Defined Networking. A decentralized VPN built on the open-source Nebula platform that we love. 1Password Extended Access Management: 1Password Extended Access Management is a device trust solution for companies with Okta, and they ensure that if a device isn't trusted and secure, it can't log into your cloud apps. Unraid: A powerful, easy operating system for servers and storage. Maximize your hardware with unmatched flexibility. Support LINUX UnpluggedLinks:
We've been tinkering with a lot of esoteric PC hardware stuff lately, so we're here with a roundup on what we've been up to this week that you'll hopefully find informative. We get into Microsoft's crackdown on the vulnerability in FanControl and other popular monitoring software, attempting to corral fan settings in UEFI as an alternative, and doing battle with the dreaded beat frequencies that can result from adjacent fan placement. Brad also gives a full trip report on his attempt to power a stack of hard drives with an external ATX power supply, with a detour into handy tips for de-pinning a modular power supply cable, stacking multiple hard drives, and more. And Will touches on his recent experience building a new studio PC in a rack-mounted case, plus some tidbits about the last electronics flea market of the year, Linux thread scheduling, Brad's first trip to Micro Center, Will's shiny new CRT (yes, another one), and more!Links for this episode:WinRing0: Why Windows is flagging your PC monitoring and fan control apps as a threat: https://www.theverge.com/report/629259/winring0-windows-defender-fan-control-pc-monitoring-alert-quarantineNoctua on fan placement and beat frequencies: https://noctua.at/en/fan-speed-offset-explainedStackable hard drive feet Brad bought: https://sednashop.com/index.php?route=product/product&product_id=95Seasonic pinout and cable compatibility info: https://seasonic.com/cable-compatibility/How to de-pin a power supply cable with two staples: https://www.youtube.com/watch?v=n6gQ5ie2Dw0Brad's NAS/hard drive setup and de-pinned cable: https://imgur.com/a/WKPwhCQ Support the Pod! Contribute to the Tech Pod Patreon and get access to our booming Discord, a monthly bonus episode, your name in the credits, and other great benefits! You can support the show at: https://patreon.com/techpod
This week's technical segment is all about the T-Lora Pager from Lilygo, and really cool Meshtastic device that can also be used for some hacking tasks! In the security news: Your safe is not safe Cisco ASA devices are under attack VMScape HybridPetya and UEFI attacks in the wild Eveything is a Linux terminal Hackers turns 30 Hosting websites on disposable vapes NPM worms and token stealing Attackers make mistakes too AI podcasts Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-892
This week's technical segment is all about the T-Lora Pager from Lilygo, and really cool Meshtastic device that can also be used for some hacking tasks! In the security news: Your safe is not safe Cisco ASA devices are under attack VMScape HybridPetya and UEFI attacks in the wild Eveything is a Linux terminal Hackers turns 30 Hosting websites on disposable vapes NPM worms and token stealing Attackers make mistakes too AI podcasts Show Notes: https://securityweekly.com/psw-892
This week's technical segment is all about the T-Lora Pager from Lilygo, and really cool Meshtastic device that can also be used for some hacking tasks! In the security news: Your safe is not safe Cisco ASA devices are under attack VMScape HybridPetya and UEFI attacks in the wild Eveything is a Linux terminal Hackers turns 30 Hosting websites on disposable vapes NPM worms and token stealing Attackers make mistakes too AI podcasts Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-892
This week's technical segment is all about the T-Lora Pager from Lilygo, and really cool Meshtastic device that can also be used for some hacking tasks! In the security news: Your safe is not safe Cisco ASA devices are under attack VMScape HybridPetya and UEFI attacks in the wild Eveything is a Linux terminal Hackers turns 30 Hosting websites on disposable vapes NPM worms and token stealing Attackers make mistakes too AI podcasts Show Notes: https://securityweekly.com/psw-892
This week we celebrate Linux turning 34, check out a plugin that makes GIMP great for thumbnails, look at a powerful new Jetson dev kit, and try out UEFI on the Rock 5 ITX+ for easy USB Linux installs.
In the security news: Hacking washing machines, good clean fun! Hacking cars via Bluetooth More Bluetooth hacking with Breaktooth Making old vulnerabilities great again: exploiting abandoned hardware Clorox and Cognizant point fingers AI generated Linux malware Attacking Russian airports When user verification data leaks Turns out you CAN steal cars with a Flipper Zero, so we're told The UEFI vulnerabilities - the hits keep coming Hijacking Discord invites The Raspberry PI laptop The new Hack RF One Pro Security appliances still fail to be secure Person Re-Identification via Wi-Fi Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-885
In the security news: Hacking washing machines, good clean fun! Hacking cars via Bluetooth More Bluetooth hacking with Breaktooth Making old vulnerabilities great again: exploiting abandoned hardware Clorox and Cognizant point fingers AI generated Linux malware Attacking Russian airports When user verification data leaks Turns out you CAN steal cars with a Flipper Zero, so we're told The UEFI vulnerabilities - the hits keep coming Hijacking Discord invites The Raspberry PI laptop The new Hack RF One Pro Security appliances still fail to be secure Person Re-Identification via Wi-Fi Show Notes: https://securityweekly.com/psw-885
In the security news: Hacking washing machines, good clean fun! Hacking cars via Bluetooth More Bluetooth hacking with Breaktooth Making old vulnerabilities great again: exploiting abandoned hardware Clorox and Cognizant point fingers AI generated Linux malware Attacking Russian airports When user verification data leaks Turns out you CAN steal cars with a Flipper Zero, so we're told The UEFI vulnerabilities - the hits keep coming Hijacking Discord invites The Raspberry PI laptop The new Hack RF One Pro Security appliances still fail to be secure Person Re-Identification via Wi-Fi Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-885
In the security news: Hacking washing machines, good clean fun! Hacking cars via Bluetooth More Bluetooth hacking with Breaktooth Making old vulnerabilities great again: exploiting abandoned hardware Clorox and Cognizant point fingers AI generated Linux malware Attacking Russian airports When user verification data leaks Turns out you CAN steal cars with a Flipper Zero, so we're told The UEFI vulnerabilities - the hits keep coming Hijacking Discord invites The Raspberry PI laptop The new Hack RF One Pro Security appliances still fail to be secure Person Re-Identification via Wi-Fi Show Notes: https://securityweekly.com/psw-885
We chat with Material Security about protecting G Suite and MS365. How else are you monitoring the most commonly used cloud environments and applications? In the security news: Google Sues Badbox operators Authenticated or Unauthenticated, big difference and my struggle to get LLMs to create exploits for me Ring cameras that were not hacked Malicous AURs Killing solar farms Weak passwords are all it takes Microsoft's UEFI keys are expiring Kali Linux and Raspberry PI Wifi updates Use lots of electricity, get a visit from law enforcement Sharepoint, vulnerabilities, nuclear weapons, and why you should use the cloud The time to next exploit is short Sonicwall devices are getting exploited How not to vibe code SMS blasters This segment is sponsored by Material Security. Visit https://securityweekly.com/materialsecurity to see purpose-built Google Workspace and Office 365 security in action! Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-884
We chat with Material Security about protecting G Suite and MS365. How else are you monitoring the most commonly used cloud environments and applications? In the security news: Google Sues Badbox operators Authenticated or Unauthenticated, big difference and my struggle to get LLMs to create exploits for me Ring cameras that were not hacked Malicous AURs Killing solar farms Weak passwords are all it takes Microsoft's UEFI keys are expiring Kali Linux and Raspberry PI Wifi updates Use lots of electricity, get a visit from law enforcement Sharepoint, vulnerabilities, nuclear weapons, and why you should use the cloud The time to next exploit is short Sonicwall devices are getting exploited How not to vibe code SMS blasters This segment is sponsored by Material Security. Visit https://securityweekly.com/materialsecurity to see purpose-built Google Workspace and Office 365 security in action! Show Notes: https://securityweekly.com/psw-884
We chat with Material Security about protecting G Suite and MS365. How else are you monitoring the most commonly used cloud environments and applications? In the security news: Google Sues Badbox operators Authenticated or Unauthenticated, big difference and my struggle to get LLMs to create exploits for me Ring cameras that were not hacked Malicous AURs Killing solar farms Weak passwords are all it takes Microsoft's UEFI keys are expiring Kali Linux and Raspberry PI Wifi updates Use lots of electricity, get a visit from law enforcement Sharepoint, vulnerabilities, nuclear weapons, and why you should use the cloud The time to next exploit is short Sonicwall devices are getting exploited How not to vibe code SMS blasters This segment is sponsored by Material Security. Visit https://securityweekly.com/materialsecurity to see purpose-built Google Workspace and Office 365 security in action! Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-884
We chat with Material Security about protecting G Suite and MS365. How else are you monitoring the most commonly used cloud environments and applications? In the security news: Google Sues Badbox operators Authenticated or Unauthenticated, big difference and my struggle to get LLMs to create exploits for me Ring cameras that were not hacked Malicous AURs Killing solar farms Weak passwords are all it takes Microsoft's UEFI keys are expiring Kali Linux and Raspberry PI Wifi updates Use lots of electricity, get a visit from law enforcement Sharepoint, vulnerabilities, nuclear weapons, and why you should use the cloud The time to next exploit is short Sonicwall devices are getting exploited How not to vibe code SMS blasters This segment is sponsored by Material Security. Visit https://securityweekly.com/materialsecurity to see purpose-built Google Workspace and Office 365 security in action! Show Notes: https://securityweekly.com/psw-884
In the security news: The train is leaving the station, or is it? The hypervisor will protect you, maybe The best thing about Flippers are the clones Also, the Flipper Zero as an interrogation tool Threats are commercial and open-source Who is still down with FTP? AI bug hunters Firmware for Russian drones Merging Android and ChromOS Protecting your assets with CVSS? Patch Citrixbleed 2 Rowhammer comes to NVIDIA GPUs I hear Microsoft hires Chinese spies Gigabyte motherboards and UEFI vulnerabilities McDonald's AI hiring bot: you want some PII with that? Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-883
In the security news: The train is leaving the station, or is it? The hypervisor will protect you, maybe The best thing about Flippers are the clones Also, the Flipper Zero as an interrogation tool Threats are commercial and open-source Who is still down with FTP? AI bug hunters Firmware for Russian drones Merging Android and ChromOS Protecting your assets with CVSS? Patch Citrixbleed 2 Rowhammer comes to NVIDIA GPUs I hear Microsoft hires Chinese spies Gigabyte motherboards and UEFI vulnerabilities McDonald's AI hiring bot: you want some PII with that? Show Notes: https://securityweekly.com/psw-883
In the security news: The train is leaving the station, or is it? The hypervisor will protect you, maybe The best thing about Flippers are the clones Also, the Flipper Zero as an interrogation tool Threats are commercial and open-source Who is still down with FTP? AI bug hunters Firmware for Russian drones Merging Android and ChromOS Protecting your assets with CVSS? Patch Citrixbleed 2 Rowhammer comes to NVIDIA GPUs I hear Microsoft hires Chinese spies Gigabyte motherboards and UEFI vulnerabilities McDonald's AI hiring bot: you want some PII with that? Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-883
A BIOS password provides a surprising amount of security on a computer -- so much that if the password is lost, chances for recovery are slim.
This week: You got a Bad box, again Cameras are expose to the Internet EU and connected devices Hydrophobia NVRAM variables Have you heard about IGEL Linux? SSH and more NVRAM AI skeptics are nuts, and AI doesn't make you more efficient Trump Cybersecurity orders I think I can root my Pixel 6 Decentralized Wordpres plugin manager Threat actor naming conventions I have the phone number linked to your Google account Fortinet flaws exploited in ransomeware attacks (and how lack of information sharing is killing us) retiring floppy disks fault injection for the masses there is no defender AI blackmails Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-878
This week: You got a Bad box, again Cameras are expose to the Internet EU and connected devices Hydrophobia NVRAM variables Have you heard about IGEL Linux? SSH and more NVRAM AI skeptics are nuts, and AI doesn't make you more efficient Trump Cybersecurity orders I think I can root my Pixel 6 Decentralized Wordpres plugin manager Threat actor naming conventions I have the phone number linked to your Google account Fortinet flaws exploited in ransomeware attacks (and how lack of information sharing is killing us) retiring floppy disks fault injection for the masses there is no defender AI blackmails Show Notes: https://securityweekly.com/psw-878
This week: You got a Bad box, again Cameras are expose to the Internet EU and connected devices Hydrophobia NVRAM variables Have you heard about IGEL Linux? SSH and more NVRAM AI skeptics are nuts, and AI doesn't make you more efficient Trump Cybersecurity orders I think I can root my Pixel 6 Decentralized Wordpres plugin manager Threat actor naming conventions I have the phone number linked to your Google account Fortinet flaws exploited in ransomeware attacks (and how lack of information sharing is killing us) retiring floppy disks fault injection for the masses there is no defender AI blackmails Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-878
In this podcast, Jon Westfall and I discussed: A significant portion of our conversation centered on the continuing proliferation of AI in consumer products. We noted an increasing sense of "AI fatigue"—the saturation of artificial intelligence in nearly every product and announcement. Although I am personally intrigued by developments in AI-generated video and imaging, especially from Google and Meta, I also find the AI trend overwhelming at times. I am even considering subscribing to Google One's AI Premium offering to further explore these capabilities, particularly for personal creative projects. We also speculated on potential announcements from Apple's upcoming WWDC, especially regarding artificial intelligence and whether Apple will finally deliver tangible AI features, following a less-than-smooth rollout of “Apple Intelligence.” I expressed hope for hardware updates, such as a refreshed Apple Watch Ultra or a more affordable version of the Vision Pro headset—rumored to be called the Vision Air. I noted that I recently began revisiting older episodes of this podcast, some dating back to 2008. I've started re-editing and publishing select episodes as audiograms. One of these featured an interview with the developers of Google Earth for iPhone, recorded in early 2009—just six months after the App Store's debut. It was particularly meaningful to hear the voice of my late friend Mike Morton, one of the app's original developers. We also touched on some of my ongoing technology experiments. I've been attempting to repurpose a 2019 AMD laptop that no longer supports Windows 11. My initial plan to install ChromeOS Flex was thwarted by hardware incompatibility, so I've shifted my attention to Linux Mint. Although I encountered issues related to UEFI preventing boot from a USB drive, I plan to revisit this project soon Jon offered a compelling perspective on the evolving role of AI in higher education. He discussed how he and other faculty are adapting to student use of AI tools such as ChatGPT, emphasizing the importance of transparency, responsible use, and pedagogical innovation. Jon's work in this area demonstrates a balanced, practical approach that integrates emerging technology while preserving academic integrity. We concluded the episode with a broader reflection on the societal implications of AI, particularly the concern that up to 50% of entry-level jobs may be impacted in the coming years. As someone no longer in the workforce, I observe these shifts with a mix of concern and curiosity, especially regarding how younger generations will navigate such disruptions. We acknowledged the historical cycles of technological change—from calculators and word processors to broadband and mobile computing—and how each brought both fear and opportunity.
We are calling for the world's best AI Engineer talks for AI Architects, /r/localLlama, Model Context Protocol (MCP), GraphRAG, AI in Action, Evals, Agent Reliability, Reasoning and RL, Retrieval/Search/RecSys , Security, Infrastructure, Generative Media, AI Design & Novel AI UX, AI Product Management, Autonomy, Robotics, and Embodied Agents, Computer-Using Agents (CUA), SWE Agents, Vibe Coding, Voice, Sales/Support Agents at AIEWF 2025! Fill out the 2025 State of AI Eng survey for $250 in Amazon cards and see you from Jun 3-5 in SF!Coreweave's now-successful IPO has led to a lot of questions about the GPU Neocloud market, which Dylan Patel has written extensively about on SemiAnalysis. Understanding markets requires an interesting mix of technical and financial expertise, so this will be a different kind of episode than our usual LS domain.When we first published $2 H100s: How the GPU Rental Bubble Burst, we got 2 kinds of reactions on Hacker News:* “Ah, now the AI bubble is imploding!”* “Duh, this is how it works in every GPU cycle, are you new here?”We don't think either reaction is quite right. Specifically, it is not normal for the prices of one of the world's most important resources right now to swing from $1 to $8 per hour based on drastically inelastic demand AND supply curves - from 3 year lock-in contracts to stupendously competitive over-ordering dynamics for NVIDIA allocations — especially with increasing baseline compute needed for even the simplest academic ML research and for new AI startups getting off the ground.We're fortunate today to have Evan Conrad, CEO of SFCompute, one of the most exciting GPU marketplace startups, talk us through his theory of the economics of GPU markets, and why he thinks CoreWeave and Modal are well positioned, but Digital Ocean and Together are not.However, more broadly, the entire point of SFC is creating liquidity between GPU owners and consumers and making it broadly tradable, even programmable:As we explore, these are the primitives that you can then use to create your own, high quality, custom GPU availability for your time and money budget, similar to how Amazon Spot Instances automated the selective buying of unused compute.The ultimate end state of where all this is going is GPU that trade like other perishable, staple commodities of the world - oil, soybeans, milk. Because the contracts and markets are so well established, the price swings also are not nearly as drastic, and people can also start hedging and managing the risk of one of the biggest costs of their business, just like we have risk-managed commodities risks of all other sorts for centuries. As a former derivatives trader, you can bet that swyx doubleclicked on that…Show Notes* SF Compute* Evan Conrad* Ethan Anderson* John Phamous* The Curve talk* CoreWeave* Andromeda ClusterFull Video PodLike and subscribe!Timestamps* [00:00:05] Introductions* [00:00:12] Introduction of guest Evan Conrad from SF Compute* [00:00:12] CoreWeave Business Model Discussion* [00:05:37] CoreWeave as a Real Estate Business* [00:08:59] Interest Rate Risk and GPU Market Strategy Framework* [00:16:33] Why Together and DigitalOcean will lose money on their clusters* [00:20:37] SF Compute's AI Lab Origins* [00:25:49] Utilization Rates and Benefits of SF Compute Market Model* [00:30:00] H100 GPU Glut, Supply Chain Issues, and Future Demand Forecast* [00:34:00] P2P GPU networks* [00:36:50] Customer stories* [00:38:23] VC-Provided GPU Clusters and Credit Risk Arbitrage* [00:41:58] Market Pricing Dynamics and Preemptible GPU Pricing Model* [00:48:00] Future Plans for Financialization?* [00:52:59] Cluster auditing and quality control* [00:58:00] Futures Contracts for GPUs* [01:01:20] Branding and Aesthetic Choices Behind SF Compute* [01:06:30] Lessons from Previous Startups* [01:09:07] Hiring at SF ComputeTranscriptAlessio [00:00:05]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and I'm joined by my co-host Swyx, founder of Smol AI.Swyx [00:00:12]: Hey, and today we're so excited to be finally in the studio with Evan Conrad from SF Compute. Welcome. I've been fortunate enough to be your friend before you were famous, and also we've hung out at various social things. So it's really cool to see that SF Compute is coming into its own thing, and it's a significant presence, at least in the San Francisco community, which of course, it's in the name, so you couldn't help but be. Evan: Indeed, indeed. I think we have a long way to go, but yeah, thanks. Swyx: Of course, yeah. One way I was thinking about kicking on this conversation is we will likely release this right after CoreWeave IPO. And I was watching, I was looking, doing some research on you. You did a talk at The Curve. I think I may have been viewer number 70. It was a great talk. More people should go see it, Evan Conrad at The Curve. But we have like three orders of magnitude more people. And I just wanted to, to highlight, like, what is your analysis of what CoreWeave did that went so right for them? Evan: Sell locked-in long-term contracts and don't really do much short-term at all. I think like a lot of people had this assumption that GPUs would work a lot like CPUs and the like standard business model of any sort of CPU cloud is you buy commodity hardware, then you lay on services that are mostly software, and that gives you high margins and pretty much all your value comes from those services. Not really the underlying. Compute in any capacity and because it's commodity hardware and it's not actually that expensive, most of that can be sort of on-demand compute. And while you do want locked-in contracts for folks, it's mostly just a sort of de-risk situation. It helps you plan revenue because you don't know if people are going to scale up or down. But fundamentally, people are like buying hourly and that's how your business is structured and you make 50 percent margins or higher. This like doesn't really work in GPUs. And the reason why it doesn't work is because you end up with like super price sensitive customers. And that isn't because necessarily it's just way more expensive, though that's totally the case. So in a CPU cloud, you might have like, you know, let's say if you had a million dollars of hardware in GPUs, you have a billion dollars of hardware. And so your customers are buying at much higher volumes than you otherwise expect. And it's also smaller customers who are buying at higher amounts of volume. So relative to what they're spending in general. But in GPUs in particular, your customer cares about the scaling law. So if you take like Gusto, for example, or Rippling or an HR service like this, when they're buying from an AWS or a GCP, they're buying CPUs and they're running web servers, those web servers, they kind of buy up to the capacity that they need, they buy enough, like CPUs, and then they don't buy any more, like, they don't buy any more at all. Yeah, you have a chart that goes like this and then flat. Correct. And it's like a complete flat. It's not even like an incremental tiny amount. It's not like you could just like turn on some more nodes. Yeah. And then suddenly, you know, they would make an incremental amount of money more, like Gusto isn't going to make like, you know, 5% more money, they're gonna make zero, like literally zero money from every incremental GPU or CPU after a certain point. This is not the case for anyone who is training models. And it's not the case for anyone who's doing test time inference or like inference that has scales at test time. Because like you, your scaling laws mean that you may have some diminishing returns, but there's always returns. Adding GPUs always means your model does actually get. And that actually does translate into revenue for you. And then for test time inference, you actually can just like run the inference longer and get a better performance. Or maybe you can run more customers faster and then charge for that. It actually does translate into revenue. Every incremental GPU translates to revenue. And what that means from the customer's perspective is you've got like a flat budget and you're trying to max the amount of GPUs you have for that budget. And it's very distinctly different than like where Augusto or Rippling might think, where they think, oh, we need this amount of CPUs. How do we, you know, reduce that? How do we reduce our amount of money that we're spending on this to get the same amount of CPUs? What that translates to is customers who are spending in really high volume, but also customers who are super price sensitive, who don't give a s**t. Can I swear on this? Can I swear? Yeah. Who don't give a s**t at all about your software. Because a 10% difference in a billion dollars of hardware is like $100 million of value for you. So if you have a 10% margin increase because you have great software, on your billion, the customers are that price sensitive. They will immediately switch off if they can. Because why wouldn't you? You would just take that $100 million. You'd spend $50 million on hiring a software engineering team to replicate anything that you possibly did. So that means that the best way to make money in GPUs was to do basically exactly what CoreWeave did, which is go out and sign only long-term contracts, pretty much ignore the bottom end of the market completely, and then maximize your long-term contracts. With customers who don't have credit risk, who won't sue you, or are unlikely to sue you for frivolous reasons. And then because they don't have credit risk and they won't sue you for frivolous reasons, you can go back to your lender and you can say, look, this is a really low risk situation for us to do. You should give me prime, prime interest rate. You should give me the lowest cost of capital you possibly can. And when you do that, you just make tons of money. The problem that I think lots of people are going to talk about with CoreWeave is it doesn't really look like a cloud platform. It doesn't really look like a cloud provider financially. It also doesn't really look like a software company financially.Swyx [00:05:37]: It's a bank.Evan [00:05:38]: It's a bank. It's a real estate company. And it's very hard to not be that. The problem of that that people have tricked themselves into is thinking that CoreWeave is a bad business. I don't think CoreWeave is explicitly a bad business. There's a bunch of people, there's kind of like two versions of the CoreWeave take at the moment. There's, oh my God, CoreWeave, amazing. CoreWeave is this great new cloud provider competitive with the hyperscalers. And to some extent, this is true from a structural perspective. Like, they are indeed a real sort of thing against the cloud providers in this particular category. And the other take is, oh my gosh, CoreWeave is this horrible business and so on and blah, blah, blah. And I think it's just like a set of perception or perspective. If you think CoreWeave's business is supposed to look like the traditional cloud providers, you're going to be really upset to learn that GPUs don't look like that at all. And in fact, for the hyperscalers, it doesn't look like this either. My intuition is that the hyperscalers are probably going to lose a lot of money, and they know they're going to lose a lot of money on reselling NVIDIA GPUs, at least. Hyperscalers, but I want to, Microsoft, AWS, Google. Correct, yeah. The Microsoft, AWS, and Google. Does Google resell? I mean, Google has TPUs. Google has TPUs, but I think you can also get H100s and so on. But there are like two ways they can make money. One is by selling to small customers who aren't actually buying in any serious volume. They're testing around, they're playing around. And if they get big, they're immediately going to do one of two things. They're going to ask you for a discount. Because they're not going to pay your crazy sort of margin that you have locked into your business. Because for CPUs, you need that. They're going to pay your massive per hour price. And so they want you to sign a long-term contract. And so that's your other way that you can make money, is you can basically do exactly what CoreWeave does, which is have them pay as much as possible upfront and lock in the contract for a long time. Or you can have small customers. But the problem is that for a hyperscaler, the GPUs to... To sell on the low margins relative to what your other business, your CPUs are, is a worse business than what you are currently doing. Because you could have spent the same money on those GPUs. And you could have trained model and you could have made a model on top of it and then turn that into a product and had high margins from your product. Or you could have taken that same money and you could have competed with NVIDIA. And you could have cut into their margin instead. But just simply reselling NVIDIA GPUs doesn't work like your CPU business. Where you're able to capture high margins from big customers and so on. And then they never leave you because your customers aren't actually price sensitive. And so they won't switch off if your prices are a little higher. You actually had a really nice chart, again, on that talk of this two by two. Sure. Of like where you want to be. And you also had some hot takes on who's making money and who isn't. Swyx: So CoreUv locked up long-term contracts. Get that. Yes. Maybe share your mental framework. Just verbally describe it because we're trying to help the audio listeners as well. Sure. People can look up the chart if they want to. Evan: Sure. Okay. So this is a graph of interest rates. And on the y-axis, it's a probability you're able to sell your GPUs from zero to one. And on the x-axis, it's how much they'll depreciate in cost from zero to one. And then you had ISO cost curves or ISO interest rate curves. Yeah. So they kind of shape in a sort of concave fashion. Yeah. The lowest interest rates enable the most aggressive. form of this cost curve. And the higher interest rates go, the more you have to push out to the top right. Yeah. And then you had some analysis of where every player sits in this, including CoreUv, but also Together and Modal and all these other guys. I thought that was super insightful. So I just wanted to elaborate. Basically, it's like a graph of risk and the genres of places where you can be and what the risk is associated with that. The optimal thing for you to do, if you can, is to lock in long-term contracts that are paid all up front or in with a situation in which you trust the other party to pay you over time. So if you're, you know, selling to Microsoft or something or OpenAI. Which are together 77% of the revenue of CoreUv. Yeah. So if you're doing that, that's a great business to be in because your interest rate that you can pitch for is really low because no one thinks Microsoft is going to default. And like maybe OpenAI will default, but the backing by Microsoft kind of doesn't. And I think there's enough, like, generally, it looks like OpenAI is winning that you can make it's just a much better case than if you're selling to the pre-seed startup that just raised $30 million or something pre-revenue. It's like way easier to make the case that the OpenAI is not going to default than the pre-seed startup. And so the optimal place to be is selling to the maximally low risk customer for as long as possible. And then you never have to worry about depreciation and you make lots of money. The less. Good. Good place to be is you could sell long-term contracts to people who might default on you. And then if you're not bringing it to the present, so you're not like saying, hey, you have to pay us all up front, then you're in this like more risky territory. So is it top left of the chart? If I have the chart right, maybe. Large contracts paid over time. Yeah. Large contracts paid over time is like top left. So it's more risky, but you could still probably get away with it. And then the other opportunity is that you could sell short-term contracts for really high prices. And so lots of people tried that too, because this is actually closer to the original business model that people thought would work in cloud providers for CPUs. It works for CPUs, but it doesn't really work for GPUs. And I don't think people were trying this because they were thinking about the risk associated with it. I think a lot of people are just come from a software background, have not really thought about like cogs or margins or inventory risk or things that you have to worry about in the physical world. And I think they were just like copy pasting the same business model onto CPUs. And also, I remember fundraising like a few years ago. And I know based on. Like what we knew other people were saying who were in a very similar business to us versus what we were saying. And we know that our pitch was way worse at the time, because in the beginning of SF Compute, we looked very similar to pretty much every other GPU cloud, not on purpose, but sort of accidentally. And I know that the correct pitch to give to an investor was we will look like a traditional CPU cloud with high margins and we'll sell to everyone. And that is a bad business model because your customers are price sensitive. And so what happens is if you. Sell at high prices, which is the price that you would need to sell it in order to de-risk your loss on the depreciation curve, and specifically what I mean by that is like, let's say you're selling it like $5 an hour and you're paying $1.50 an hour for the GPU under the hood. It's a little bit different than that, but you know, nice numbers, $5 an hour, $1.50 an hour. Great. Excellent. Well, you're charging a really high price per GPU hour because over time the price will go down and you'll get competed out. And what you need is to make sure that you never go under, or if you do go under your underlying cost. You've made so much money in the first part of it that the later end of it, like doesn't matter because from the whole structure of the deal, you've made money. The problem is that just, you think that you're going to be able to retain your customers with software. And actually what happens is your customers are super price sensitive and push you down and push you down and push you down and push you down, um, that they don't care about your software at all. And then the other problem that you have is you have, um, really big players like the hyperscalers who are looking to win the market and they have way more money than you, and they can push down on margin. Much better than you can. And so if they have to, and they don't, they don't necessarily all the time, um, I think they actually keep pride of higher margin, but if they needed to, they could totally just like wreck your margin at any point, um, and push you down, which meant that that quadrant over there where you're charging a high price, um, and just to make up for the risk completely got destroyed, like did not work at all for many places because of the price sensitivity, because people could just shove you down instead that pushed everybody up to the top right-hand corner of that, which is selling short-term. Contracts for low prices paid over time, which is the worst place to be in, um, the worst financial place to be in because it has the highest interest rate, um, which means that your, um, your costs go up at the same time, your, uh, your incoming cash goes down and squeezes your margins and squeezes your margins. The nice thing for like a core weave is that most of their business is over on the, on the other sides of those quadrants that the ones that survive. The only remaining question I have with core weave, and I promise I get to ask if I can compute, and I promise this is relevant to SOF Compute in general, because the framework is important, right? Sure. To understand the company. So why didn't NVIDIA or Microsoft, both of which have more money than core weave, do core weave, right? Why didn't they do core weave? Why have this middleman when either NVIDIA or Microsoft have more money than God, and they could have done an internal core weave, which is effectively like a self-funding vehicle, like a financial instrument. Why does there have to be a third party? Your question is like... Why didn't Microsoft, or why didn't NVIDIA just do core weave? Why didn't they just set up their own cloud provider? I think, and I don't know, and so correct me if I'm wrong, and lots of people will have different opinions here, or I mean, not opinions, they'll have actual facts that differ from my facts. Those aren't opinions. Those are actually indeed differences of reality, is that NVIDIA doesn't want to compete with their customers. They make a large amount of money by selling to existing clouds. If they launched their own core weave, then it would be a lot more money. It'd make it much harder for them to sell to the hyperscalers, and so they have a complex relationship with there. So not great for them. Second is that, at least for a while, I think they were dealing with antitrust concerns or fears that if they're going through, if they own too much layers of the stack, I could imagine that could be a problem for them. I don't know if that's actually true, but that's where my mind would go, I guess. Mostly, I think it's the first one. It's that they would be competing directly with their primary customers. Then Microsoft could have done it, right? That's the other question. Yeah, so Microsoft didn't do it. And my guess is that... NVIDIA doesn't want Microsoft to do it, and so they would limit the capacity because from NVIDIA's perspective, both they don't want to necessarily launch their own cloud provider because it's competing with their customers, but also they don't want only one customer or only a few customers. It's really bad for NVIDIA if you have customer concentration, and Microsoft and Google and Amazon, like Oracle, to buy up your entire supply, and then you have four or five customers or so who pretty much get to set prices. Monopsony. Yeah, monopsony. And so the optimal thing for you is a diverse set of customers who all are willing to pay at whatever price, because if you don't, somebody else will. And so it's really optimal for NVIDIA to have lots of other customers who are all competing against each other. Great. Just wanted to establish that. It's unintuitive for people who have never thought about it, and you think about it all day long. Yeah. Swyx: The last thing I'll call out from the talk, which is kind of cool, and then I promise we'll get to SF Compute, is why will DigitalOcean and Together lose money on their clusters? Why will DigitalOcean and Together lose money on their clusters?Evan [00:16:33]: I'm going to start by clarifying that all of these businesses are excellent and fantastic. That Together and DigitalOcean and Lambda, I think, are wonderful businesses who build excellent products. But my general intuition is that if you try to couple the software and the hardware together, you're going to lose money. That if you go out and you buy a long-term contract from someone and then you layer on services, or you buy the hardware yourself and you spin it up and you get a bunch of debt, you're going to run into the same problem that everybody else did, the same problem we did, same problem the hyperscalers did. And that's exactly what the hyperscalers are doing, which is you cannot add software and make high margins like a cloud provider can. You can pitch that into investors and it will totally make sense, and it's like the correct play in CPUs, but there isn't software you could make to make this occur. If you're spending a billion dollars on hardware, you need to make a billion dollars of software. There isn't a billion dollars of software that you can realistically make, and if you do, you're going to look like SAP. And that's not a knock on SAP. SAP makes a f**k ton of money, right? Right. Right. Right. Right. There aren't that many pieces of software that you could make, that you can realistically sell, like a billion dollars of software, and you're probably not going to do it to price-sensitive customers who are spending their entire budget already on compute. They don't have any more money to give you. It's a very hard proposition to do. And so many parties have been trying to do this, like, buy their own compute, because that's what a traditional cloud does. It doesn't really work for them. You know that meme where there's, like, the Grim Reaper? And he's, like, knocking on the door, and then he keeps knocking on the next door? We have just seen door after door after door of the Grim Reeker comes by, and the economic realities of the compute market come knocking. And so the thing we encourage folks to do is if you are thinking about buying a big GPU cluster and you are going to layer on software on top, don't. There are so many dead bodies in the wake there. We would recommend not doing that. And we, as SF Compute, our entire business is structured to help you not do that. It's helped disintegrate these. The GPU clouds are fantastic real estate businesses. If you treat them like real estate businesses, you will make a lot of money. The cloud services you can make on that, all the software you want to make on that, you can do that fantastically. If you don't own the underlying hardware, if you mix these businesses together, you get shot in the head. But if you combine, if you split them, and that's what the market does, it helps you split them, it allows you to buy, like, layer on services, but just buy from the market, you can make lots of money. So companies like Modal, who don't own the underlying compute, like they don't own it, lots of money, fantastic product. And then companies like Corbeave, who are functionally like really, really good real estate businesses, lots of money, fantastic product. But if you combine them, you die. That's the economic reality of compute. I think it also splits into trading versus inference, which are different kinds of workloads. Yeah. And then, yeah, one comment about the price sensitivity thing before we leave this. This topic, I want to credit Martin Casado for coining or naming this thing, which is like, you know, you said, you said this thing about like, you don't have room for a 10% margin on GPUs for software. Yep. And Martin actually played it out further. It's his first one I ever saw doing this at large enough runs. So let's say GPT-4 and O1 both had a total trading cost of like a $500 billion is the rough estimate. When you get the $5 billion runs, when you get the $50 billion runs, it is actually makes sense to build your own. You're going to have to get into chips, like for OpenEI to get into chip design, which is so funny. I would make an ASIC for this run. Yeah, maybe. I think a caveat of that that is not super well thought about is that only works if you're really confident. It only works if you really know which chip you're going to do. If you don't, then it's a little harder. So it makes in my head, it makes more sense for inference where you've already established it. But for training there's so much like experimentation. Any generality, yeah. Yeah. The generality is much more useful. Yeah. In some sense, you know, Google's like six generations into the CPUs. Yeah. Yeah. Okay, cool. Maybe we should go into SF Compute now. Sure. Yeah.Alessio [00:20:37]: Yeah. So you kind of talked about the different providers. Why did you decide to go with this approach and maybe talk a bit about how the market dynamics have evolved since you started a company?Evan [00:20:47]: So originally we were not doing this at all. We were definitely like forced into this to some extent. And SF Compute started because we wanted to go train models for music and audio in general. We were going to do a sort of generic audio model at some points, and then we were going to do a music model at some points. It was an early company. We didn't really spec down on a particular thing. But yeah, we were going to do a music model and audio model. First thing that you do when you start any AI lab is you go out and you buy a big cluster. The thing we had seen everybody else do was they went out and they raised a really big round and then they would get stuck. Because if you raise the amount of money that you need to train a model initially, like, you know, the $50 million pre-seed, pre-revenue, your valuation is so high or you get diluted so much that you can't raise the next round. And that's a very big ask to make. And also, I don't know, I felt like we just felt like we couldn't do it. We probably could have in retrospect, but I think one, we didn't really feel like we could do it. Two, it felt like if we did, we would have been stuck later on. We didn't want to raise the big round. And so instead, we thought, surely by now, we would be able to just go out. To any provider and buy like a traditional CPU cloud would sell offer you and just buy like on demand or buy like a month or so on. And this worked for like small incremental things. And I think this is where we were basing it off. We just like assumed we could go to like Lambda or something and like buy thousands of at the time A100s. And this just like was not at all the case. So we started doing all the sales calls with people and we said, OK, well, can we just get like month to month? Can we get like one month of compute or so on? Everyone told us at the time, no. You need to have a year long contract or longer or you're out of luck. Sorry. And at the time, we were just like pissed off. Like, why won't nobody sell us a month at a time? Nowadays, we totally understand why, because it's the same economic reason. Because if you if they had sold us the month to month or so on and we canceled or so on, they would have massive risk on that. And so the optimal thing to do was to only to just completely abandon the section of the market. We didn't like that. So our plan was we were going to buy a year long contract anyway. We would use a month. And then we would. At least the other 11 months. And we were locked in for a year, but we only had to pay on every individual month. And so we did this. But then immediately we said, oh, s**t, now we have a cloud provider, not a like training models company, not an AI lab, because every 30 days we owed about five hundred thousand dollars or so and we had about five hundred thousand dollars in the bank. So that meant that every single month, if we did not sell out our cluster, we would just go bankrupt. So that's what we did for the first year of the company. And when you're in that position. You try to think how in the world you get out of that position, what that transition to is, OK, well, we tend to be pretty good at like selling this cluster every month because we haven't died yet. And so what we should do is we should go basically be like this broker for other people and we will be more like a GPU real estate or like a GPU realtor. And so we started doing that for a while where we would go to other people who had who was trying to sell like a year long contract with somebody and we'd go to another person who like maybe this person wanted six months and somebody else on six months or something and we'd like combine all these people. Together to make the deal happen and we'd organize these like one off bespoke deals that looked like basically it ended up with us taking a bunch of customers, us signing with a vendor, taking some cut and then us operating the cluster for people typically with bare metal. And so we were doing this, but this was definitely like a oh, s**t, oh, s**t, oh, s**t. How do we get out of our current situation and less of a like a strategic plan of any sort? But while we were doing this, since like the beginning of the company, we had been thinking about how to buy GPU clusters, how to sell them effectively, because we'd seen every part of it. And what we ended up with was like a book of everybody who's trying to buy and everyone is trying to sell because we were these like GPU brokers. And so that turned into what is today SF Compute, which is a compute market, which we think we are the functionally the most liquid GPU market of any capacity. Honestly, I think we're the only thing that actually is like a real market that there's like bids and asks and there's like a like a trading engine that combines everything. And so. I think we're the only place where you can do things that a market should be able to do. Like you can go on SF Compute today and you get thousands of H100s for an hour if you want. And that's because there is a price for thousands of GPUs for an hour. That is like not a thing you can reasonably do on kind of any other cloud provider because nobody should realistically sell you thousands of GPUs for an hour. They should sell it to you for a year or so on. But one of the nice things about a market is that you can buy the year on SF Compute. But then if you need to sell. Back, you can sell back as well. And that opens up all these little pockets of liquidity where somebody who's just trying to buy for a little bit of time, some burst capacity. So people don't normally buy for an hour. That's not like actually a realistic thing, but it's like the range somebody who wants, who is like us, who needed to buy for a month can actually buy for a month. They can like place the order and there is actually a price for that. And it typically comes from somebody else who's selling back. Somebody who bought a longer term contract and is like they bought for some period of time, their code doesn't work, and now they need to like sell off a little bit.Alessio [00:25:49]: What are the utilization rates at which a market? What are the utilization rates at which a market? Like this works, what do you see the usual GPU utilization rate and like at what point does the market get saturated?Evan [00:26:00]: Assuming there are not like hardware problems or software problems, the utilization rate is like near 100 percent because the price dips until the utilization is 100 percent. So the price actually has to dip quite a lot in order for the utilization not to be. That's not always the case because you just have logistical problems like you get a cluster and parts of the InfiniBand fabric are broken. And there's like some issue with some switch somewhere and so you have to take some portion of the cluster offline or, you know, stuff like this, like there's just underlying physical realities of the clusters, but nominally we have better utilization than basically anybody because, but that's on utilization of the cluster, like that doesn't necessarily translate into, I mean, I actually do think we have much better overall money made for our underlying vendors than kind of anybody else. We work with the other GPU clouds and the basic pitch to the other GPU clouds is one. So we can sell your broker so we can we can find you the long term contracts that are at the prices that you want, but meanwhile, your cluster is idle and for that we can increase your utilization and get you more money because we can sell that idle cluster for you and then the moment we find the longer, the bigger customer and they come on, you can kick off those people and then go to the other ones. You get kind of the mix of like sell your cluster at whatever price you can get on the market and then sell your cluster at the big price that you want to do for long term contract, which is your ideal business model. And then the benefit of the whole thing being on the market. Is you can pitch your customer that they can cancel their long term contract, which is not a thing that you can reasonably do if you are just the GPU cloud, if you're just the GPU cloud, you can never cancel your contract, because that introduces so much risk that you would otherwise, like not get your cheap cost of capital or whatever. But if you're selling it through the market, or you're selling it with us, then you can say, hey, look, you can cancel for a fee. And that fee is the difference between the price of the market and then the price that they paid at, which means that they canceled and you have the ability to offer that flexibility. But you don't. You don't have to take the risk of it. The money's already there and like you got paid, but it's just being sold to somebody else. One of our top pieces from last year was talking about the H100 glut from all the long term contracts that were not being fully utilized and being put under the market. You have on here dollar a dollar per hour contracts as well as it goes up to two. Actually, I think you were involved. You were obliquely quoted in that article. I think you remember. I remember because this was hidden. Well, we hid your name, but then you were like, yeah, it's us. Yeah. Could you talk about the supply and demand of H100s? Was that just a normal cycle? Was that like a super cycle because of all the VC funding that went in in 2003? What was that like? GPU prices have come down. Yeah, GPU prices have come down. And there's some part that has normal depreciation cycle. Some part of that is just there were a lot of startups that bought GPUs and never used them. And now they're lending it out and therefore you exist. There's a lot of like various theories as to why. This happened. I dislike all of them because they're all kind of like they're often said with really high confidence. And I think just the market's much more complicated than that. Of course. And so everything I'm going to say is like very hedged. But there was a series of like places where a bunch of the orders were placed and people were pitching to their customers and their investors and just the broader market that they would arrive on time. And that is not how the world works. And because there was such a really quick build out of things, you would end up with bottlenecks in the supply chain somewhere that has nothing to do with necessarily the chip. It's like the InfiniBand cables or the NICs or like whatever. Or you need a bunch of like generators or you don't have data center space or like there's always some bottleneck somewhere else. And so a lot of the clusters didn't come online within the period of time. But then all the bottlenecks got sorted out and then they all came online all at the same time. So I think you saw a short. There was a shortage because supply chain hard. And then you saw a increase or like a glut because supply chain eventually figure itself out. And specifically people overordered in order to get the allocation that they wanted. Then they got the allocations and then they went under. Yeah, whatever. Right. There was just a lot of shenanigans. A caveat of this is every time you see somebody like overordered, there is this assumption that the problem was like the demand went down. I don't think that's the case at all. And so I want to clarify that. It definitely seems like a shortage. Like there's more demand for GPUs than there ever was. It's just that there was also more supply. So at the moment, I think there is still functionally a glut. But the difference that I think is happening is mostly the test time inference stuff that you just need way more chips for that than you did before. And so whenever you make a statement about the current market, people sort of take your words and then they assume that you're making a statement about the future market. And so if you say there's a glut now, people will continue to think there's a glut. But I think what is happening at the moment. My general prediction is that like by the winter, we will be back towards shortage. But then also, this very much depends on the rollout of future chips. And that comes with its own. I think I'm trying to give you like a good here's Evan's forecast. Okay. But I don't know if my forecast is right. You don't have to. Nobody is going to hold you to it. But like I think people want to know what's true and what's not. And there's a lot of vague speculations from people who are not that close to the market actually. And you are. I think I'm a closer. Close to the market, but also a vague speculator. Like I think there are a lot of really highly confident speculators and I am indeed a vague speculator. I think I have more information than a lot of other people. And this makes me more vague of a spectator because I feel less certain or less confident than I think a lot of other people do. The thing I do feel reasonably confident about saying is that the test time inference is probably going to quite significantly expand the amount of compute that was used for inference. So a caveat. This is like pretty much all the inference demand is in a few companies. A good example is like lots of bio and pharma was using H100s training sort of the bio models of sorts. And they would come along and they would buy, you know, thousands of H100s for training and then just like not a lot of stuff for inference. Not in any, not relative to like an opening iron anthropic or something because they like don't have a consumer product. Their inference event, if they can do it right. There's really like only one inference event that matters. And obviously I think they're going to run into it. And Batch and they're not going to literally just run one inference event. But like the one that produces the drug is the important one. Right. And I'm dumb and I don't know anything about biology, so I could be completely wrong here. But my understanding is that's kind of the gist. I can check that for you. You can check that for me. Check that for me. But my understanding is like the one that produces the sequence that is the drug that, you know, cures cancer or whatever. That's the important deal. But like a lot of models look like this where they're sort of more enterprising use cases or they're so prior to something that looks like test time inference. You got lots and lots of demand for training and then pretty much entirely fell off for inference. And I think like we looked at like Open Router, for example, the entirety of Open Router that was not anthropic or like Gemini or OpenAI or something. It was like 10 H100 nodes or something like that. It's just like not that much. It's like not that many GPUs actually to service that entire demand. But that's like a really sizable portion of the sort of open source market. But the actual amount of compute needed for it was not that much. But if you imagine like what an OpenAI needs for like GPT-4, it's like tremendously big. But that's because it's a consumer product that has almost all the inference demand. Yeah, that's a message we've had. Roughly open source AI compared to closed AI is like 5%. Yeah, it's like super small. Super small. It's super small. Super small. But test time inference changes that quite significantly. So I will... I will expect that to increase our overall demand. But my question on whether or not that actually affects your compute price is entirely based on how quickly do we roll out the next chips. The way that you burst is different for test time.Alessio [00:34:01]: Any thoughts on the third part of the market, which is the more peer-to-peer distributed, some are like crypto-enabled, like Hyperbolic, Prime Intellect, and all of that. Where do those fit? Like, do you see a lot of people will want to participate in a peer-to-peer market? Or just because of the capital requirements at the end of the day, it doesn't really matter?Evan [00:34:20]: I'm like wildly skeptical of these, to be frankly. The dream is like steady at home, right? I got this $15.90. Nobody has $15.90. $14.90 sitting at home. I can rent it out. Yeah. Like, I just don't really think this is going to ever be more efficient than a fully interconnected cluster with InfiniBand or, you know, whatever the sort of next spec might be. Like, I could be completely wrong. But speaking of... I mean, like, SpeedoLite is really hard to beat. And regardless of whatever you're using, you just like can't get around that physical limitation. And so you could like imagine a decentralized market that still has a lot of places where there's like co-location. But then you would get something that looks like SF Compute. And so that's what we do. That's why we take our general take is like on SF Compute, you're not buying from like random people. You're buying from the other GPU clouds, functionally. You're buying from data centers that are the same genre of people that you would work with already. And you can specify, oh, I want all these nodes to be co-located. And I don't think you're really going to get around that. And I think I buy crypto for the purposes of like transferring money. Like the financial system is like quite painful and so on. I can understand the uses of it to sort of incentivize an initial market or try to get around the cold start problem. We've been able to get around the cold start problem just fine. So it didn't actually need that at all. What I do think is totally possible is you could launch a token and then you could like subsidize the crypto. You could compute prices for a bit, but like maybe that will help you. I think that's what Nuus is doing. Yeah, I think there's lots of people who are trying to do things like this, but at some point that runs out. So I would, I think generally agree. I think the only thread in that model is very fine grained mixture of experts that can be like algorithms can shift to adapt to hardware realities. And the hardware reality is like, okay, it's annoying to do large co-located clusters. Then we'll just redesign attention or whatever in our architecture to distribute it more. There was a little bit buzz of block attention last year that Strong Compute made a big push on. But I think like, you know, in a world where we have 200 experts in MOE model, it starts to be a little bit better. Like, I don't disagree with this. I can imagine the world in which you have like, in which you've redesigned it to be more parallelizable, like across space.Evan [00:36:43]: But assuming without that, your hardware limitation is your speed of light limitation. And that's a very hard one to get around.Alessio [00:36:50]: Any customers or like stories that you want to shout out of like maybe things that wouldn't have been economically viable like others? I know there's some sensitivity on that.Evan [00:37:00]: My favorites are grad students, are folks who are trying to do things that would normally otherwise require the scale of a big lab. And the grad students are like the worst pilots. They're like the worst possible customer for the traditional GPU clouds because they will immediately turn if you sell them a thing because they're going to graduate and they're not going to go anywhere. They're not going to like, that project isn't continuing to spend lots of money. Like sometimes it does, but not if you're like working with the university or you're working with the lab of some sort. But a lot of times it's just like the ability for us to offer like big burst capacity, I think is lovely and wonderful. And it's like one of my favorite things to do because all those folks look like we did. And I have a special place in my heart for that. I have a special place in my heart for young hackers and young grad students and researchers who are trying to do the same genre of thing that we are doing. For the same reason, I have a special place in my heart for like the startups, the people who are just actively trying to compete on the same scale, but can't afford it time-wise, but can afford it spike-wise. Yeah, I liked your example of like, I have a grant of 100K and it's expiring. I got to spend it on that. That's really beautiful. Yeah. Interesting. Has there been interesting work coming out of that? Anything you want to mention? Yeah. So from like a startup perspective, like Standard Intelligence and Find, P-H-I-N-D. We've had them on the pod.Swyx [00:38:23]: Yeah. Yeah.Evan [00:38:23]: That was great. And then from grad students' perspective, we worked a lot with like the Schmidt Futures grantees of various sorts. My fear is if I talk about their research, I will be completely wrong to a sort of almost insulting degree because I am very dumb. But yeah. I think one thing that's maybe also relevant startups and GPUs-wise. Yeah. Is there was a brief moment where it kind of made sense that VCs provided GPU clusters. And obviously you worked at AI Grants, which set up Andromeda, which is supposedly a $100 million cluster. Yeah. I can explain why that's the case or why anybody would think that would be smart. Because I remember before any of that happened, we were asking for it to happen. Yeah. And the general reason is credit risk. Again, it's a bank. Yeah. I have lower risk than you due to credit transformation. I take your risk onto my balance sheet. Correct. Exactly. If you wanted to go for a while, if you wanted to go set up a GPU cluster, you had to be the one that actually bought the hardware and racked it and stacked it, like co-located it somewhere with someone. Functionally, it was like on your balance sheet, which means you had to get a loan. And you cannot get a loan for like $50 million as a startup. Like not really. You can get like venture debt and stuff, but like it's like very, very difficult to get a loan of any serious price for that. But it's like not that difficult to get a loan for $50 million. If you already have a fund or you already have like a million dollars under your assets somewhere or like you personally can like do a personal guarantee for it or something like this. If you have a lot of money, it is way easier for you to get a loan than if you don't have a lot of money. And so the hack of a VC or some capital partner offering equity for compute is always some arbitrage on the credit risk. That's amazing. Yeah. That's a hack. You should do that. I don't think people should do it right now. I think the market has like, I think it made sense at the time and it was helpful and useful for the people who did it at the time. But I think it was a one-time arbitrage because now there are lots of other sources that can do it. And also I think like it made sense when no one else was doing it and you were the only person who was doing it. But now it's like it's an arbitrage that gets competed down. Sure. So it's like super effective. I wouldn't totally recommend it. Like it's great that Andromeda did it. But the marginal increase of somebody else doing it is like not super helpful. I don't think that many people have followed in their footsteps. I think maybe Andreessen did it. Yeah. That's it. I think just because pretty much all the value like flows through Andromeda. What? That cannot be true. How many companies are in the air, Grant? Like 50? My understanding of Andromeda is it works with all the NFTG companies or like several of the NFTG companies. But I might be wrong about that. Again, you know, something something. Nat, don't kill me. I could be completely wrong. But the but you know, I think Andromeda was like an excellent idea to do at the right time in which it occurred. Perfect. His timing is impeccable. Timing. Yeah. Nat and Daniel are like, I mean, there's lots of people who are like... Sears? Yeah. Sears. Like S-E-E-R. Oh, Sears. Like Sears of the Valley. Yeah. They for years and years before any of the like ChatGPT moment or anything, they had fully understood what was going to happen. Like way, way before. Like. AI Grant is like, like five years old, six years old or something like that. Seven years old. When I, when it like first launched or something. Depends where you start. The nonprofit version. Yeah. The nonprofit version was like, like happening for a while, I think. It's going on for quite a bit of time. And then like Nat and Daniel are like the early investors in a lot of the sort of early AI labs of various sorts. They've been doing this for a bit.Alessio [00:41:58]: I was looking at your pricing yesterday. We're kind of talking about it before. And there's this weird thing where one week is more expensive of both one day and one month. Yeah. What are like some of the market pricing dynamics? What are things that like this to somebody that is not in the business? This looks really weird. But I'm curious, like if you have an explanation for it, if that looks normal to you. Yeah.Evan [00:42:18]: So the simple answer is preemptible pricing is cheaper than non-preemptible pricing. And the same economic principle is the reason why that's the case right now. That's not entirely true on SF Compute. SF Compute doesn't really have the concept of preemptible. Instead, what it has is very short reservations. So, you know, you go to a traditional cloud provider and you can say, hey, I want to reserve contract for a year. We will let you do a reserve contract for one hour, which is the part of SFC. But what you can do is you can just buy every single hour continuously. And you're reserving just for that hour. And then the next hour you reserve just for that next hour. And this is obviously like a built in. This is like an automation that you can do. But what you're seeing when you see the cheap price is you're seeing somebody who's buying the next hour, but maybe not necessarily buying an hour after that. So if the price goes up. Up too much. They might not get that next hour. And the underlying part of this of where that's coming from the market is you can imagine like day old milk or like milk that's about to be old. It might drop its price until it's expired because nobody wants to buy the milk that's in the past. Or maybe you can't legally sell it. Compute is the same way. No, you can't sell a block of compute that is not that is in the past. And so what you should do in the market and what people do do is they take. They take a block. A block of compute. And then they drop it and drop it and drop it and drop into a floor price right before it's about to expire. And they keep dropping it until it clears. And so anything that is idle drops until some point. So if you go and use on the website and you set that that chart to like a week from now, what you'll see is much more normal looking sort of curves. But if you say, oh, I want to start right now, that immediate instant, here's the compute that I want right now is the is functionally the preemptible price. It's where most people are getting the best compute or like the best compute prices from. The caveat of that is you can do really fun stuff on SFC if you want. So because it's not actually preemptible, it's it's reserved, but only reserved for an hour, which means that the optimal way to use as of compute is to just buy on the market price, but set a limit price that is much higher. So you can set a limit price for like four dollars and say, oh, if the market ever happens to spike up to four dollars, then don't buy. I don't want to buy that at that price for that price. I don't want to buy that at that price for that price for an hour. But otherwise, just buy at the cheapest price. And if you're comfortable with that of the volatility of it, you're actually going to get like really good prices, like close to a dollar an hour or so on, sometimes down to like 80 cents or whatever. You said four, though. Yeah. So that's the thing. You want to lower the limit. So four is your max price. Four is like where you basically want to like pull the plug and say don't do it because the actual average price is not or like the, you know, the preemptible price doesn't actually look like that. So what you're doing when you're saying four is always, always, always give me this compute. Like continue to buy every hour. Don't preempt me. Don't kick me off. And I want this compute and just buy at the preemptible price, but never kick me off. The only times in which you get kicked off is if there is a big price spike. And, you know, let's say one day out of the year, there's like a four dollar an hour price because of some weird fluke or something. If there are other periods of time, you're actually getting a much lower price than you. It makes sense. Your your average cost that you're actually paying is way better. And your trade off here is you don't literally know what price you're going to get. So it's volatile. But your actual average historically has been like everyone who's done this has gotten wildly better prices. And this is like one of the clever things you can do with the market. If you're willing to make those trade offs, you can get a lot of really good prices. You can also do other things like you can only buy at night, for example. So the price goes down at night. And so you can say, oh, I want to only buy, you know, if the price is lower than 90 cents. And so if you have some long running job, you can make it only run on 90 cents and then you recover back and so on. Yeah. So what you can kind of create as like a spot inst is what other the CPU world has. Yes. But you've created a system where you can kind of manufacture the exact profile that you want. Exactly. That is not just whatever the hyperscalers offer you, which is usually just one thing. Correct. SF Compute is like the power tool. The underlying primitives of like hourly compute is there. Correct. Yeah, it's pretty interesting. I've often asked OpenAI. So like, you know, all these guys. Cloud as well. They do batch APIs. So it's half off of whatever your thing is. Yeah. And the only contract is we'll return in 24 hours. Sure. Right. And I was like, 24 hours is good. But sometimes I want one hour. I want four hours. I want something. And so based off of SF Compute's system, you can actually kind of create that kind of guarantee. Totally. That would be like, you know, not 24, but within eight hours, within four hours, like the work half of a workday. Yes. I can return your results to you. And then I can return it to you. And if your latency requirements are like that low, actually it's fine. Yes. Correct. Yeah. You can carve out that. You can financially engineer that on SFC. Yeah. Yeah. I mean, I think to me that unlocks a lot of agent use cases that I want, which is like, yeah, I worked in a background, but I don't want you to take a day. Yeah. Correct. Take a couple hours or something. Yeah. This touches a lot of my like background because I used to be a derivatives trader. Yeah. And this is a forward market. Yeah. A futures forward market, whatever you call it. Not a future. Very explicitly not a future. Not yet a futures. Yes. But I don't know if you have any other points to talk about. So you recognize that you are a, you know, a marketplace and you've hired, I met Alex Epstein at your launch event and you're like, you're, you're building out the financialization of GPUs. Yeah. So part of that's legal. Mm-hmm. Totally. Part of that is like listing on an exchange. Yep. Maybe you're the exchange. I don't know how that works, but just like, talk to me about that. Like from the legal, the standardization, the like, where is this all headed? You know, is this like a full listed on the Chicago Mercantile Exchange or whatever? What we're trying to do is create an underlying spot market that gives you an index price that you can use. And then with that index price, you can create a cash settled future. And with a cash settled future, you can go back to the data centers and you can say, lock in your price now and de-risk your entire position, which lets you get cheaper cost of capital and so on. And that we think will improve the entire industry because the marginal cost of compute is the risk. It's risk as shown by that graph and basically every part of this conversation. It's risk that causes the price to be all sorts of funky. And we think a future is the correct solution to this. So that's the eventual goal. Right now you have to make the underlying spot market in order to make this occur. And then to make the spot market work, you actually have to solve a lot of technology problems. You really cannot make a spot market work if you don't run the clusters, if you don't have control over them, if you don't know how to audit them, because these are super computers, not soybeans. They have to work. In a way that like, it's just a lot simpler to deliver a soybean than it is to deliver it. I don't know. Talk to the soybean guys. Sure. You know? Yeah. But you have to have a delivery mechanism. Your delivery mechanism, like somebody somewhere has to actually get the compute at some point and it actually has to work. And it is really complicated. And so that is the other part of our business that we go and we build a bare metal infrastructure stack that goes. And then also we do auditing of all the clusters. You sort of de-risk the technical perspective and that allows you to eventually de-risk the financial perspective. And that is kind of the pitch of SF Compute. Yeah. I'll double click on the auditing on the clusters. This is something I've had conversations with Vitae on. He started Rika and I think he had a blog post which kind of shone the light a little bit on how unreliable some clusters are versus others. Correct. Yeah. And sometimes you kind of have to season them and age them a little bit to find the bad cards. You have to burn them in. Yeah. So what do you do to audit them? There's like a burn-in process, a suite of tests, and then active checking and passive checking. Burn-in process is where you typically run LINPACK. LINPACK is this thing that like a bunch of linear algebra equations that you're stress testing the GPUs. This is a proprietary thing that you wrote? No, no, no. LINPACK is like the most common form of burn-in. If you just type in burn-in, typically when people say burn-in, they literally just mean LINPACK. It's like an NVIDIA reference version of this. Again, NVIDIA could run this before they ship, but now the customers have to do it. It's annoying. You're not just checking for the GPU itself. You're checking like the whole component, all the hardware. And it's a lot of work. It's an integration test. It's an integration test. Yeah. So what you're doing when you're running LINPACK or burn-in in general is you're stress testing the GPUs for some period of time, 48 hours, for example, maybe seven days or so on. And you're just trying to kill all the dead GPUs or any components in the system that are broken. And we've had experiences where we ran LINPACK on a cluster and it rounds out, sort of comes offline when you run LINPACK. This is a pretty good sign that maybe there is a problem with this cluster. Yeah. So LINPACK is like the most common sort of standard test. But then beyond that, what you do is we have like a series of performance tests that replicate a much more realistic environment as well that we run just assuming if LINPACK works at all, then you run the next set of tests. And then while the GPUs are in operation, you're also going through and you're doing active tests and passive tests. Passive tests are things that are running in the background while somebody else is running, while like some other workload is running. And active tests are during like idle periods. You're running some sort of check that would otherwise sort of interrupt something. And then the active tests will take something offline, basically. Or a passive check might mark it to get taken offline later and so on. And then the thing that we are working on that we have working partially but not entirely is automated refunds, which is basically like, is the case that the hardware breaks so much. And there's only so much that we can do and it is the effect of pretty much the entire industry. So a pretty common thing that I think happens to kind of everybody in the space is a customer comes online, they experience your cluster, and your cluster has the same problem that like any cluster has, or it's I mean, a different problem every time, but they experience one of the problems of HPC. And then their experience is bad. And you have to like negotiate a refund or some other thing like this. It's always case by case. And like, yeah, a lot of people just eat the cost. Correct. So one of the nice things about a market that we can do as we get bigger and have been doing as we can bigger is we can immediately give you something else. And then also we can automatically refund you. And you're still gonna experience it like the hardware problems aren't going away until the underlying vendors fix things. But honestly, I don't think that's likely because you're always pushing the limits of HPC. This is the case of trying to build a supercomputer. that's one of the nice things that we can do is we can switch you out for somebody else somewhere, and then automatically refund you or prorate or whatever the correct move is. One of the things that you say in this conversation with me was like, you know, you know, a provider is good when they guarantee automatic refunds. Which doesn't happen. But yeah, that's, that's in our contact with all the underlying cloud providers. You built it in already. Yeah. So we have a quite strict SLA that we pass on to you. The reason why
Three Buddy Problem - Episode 38: On the show this week, we look at a hefty batch of Microsoft zero-days exploited in the wild, iOS 18.3.2 fixing an exploited WebKit bug, a mysterious Unpatched.ai being credited with Microsoft Access RCE flaws, and OpenAI lobbying for the US to ban China's DeepSeek. Plus, discussion on a Binarly technical paper with new approach to finding UEFI bootkits, Mandiant flagging custom backdoors on Juniper routers, and MEV 'sandwich attacks' front-running cryptocurrency transactions. Cast: Juan Andres Guerrero-Saade (https://twitter.com/juanandres_gs), Costin Raiu (https://twitter.com/craiu) and Ryan Naraine (https://twitter.com/ryanaraine).
Our thoughts on Zero Trust World, and just a little bit of news. Of course we covered some firmware and UEFI without Paul! Visit https://www.securityweekly.com/psw for all the latest episodes! Show Notes: https://securityweekly.com/psw-862
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
In this episode, we cover how to use honeypot data to keep your offensive infrastructure alive longer, three critical vulnerabilities in SimpleHelp that must be patched now, and an interesting vulnerability affecting many systems allowing UEFI Secure Boot bypass. Leveraging Honeypot Data for Offensive Security Operations [Guest Diary] A recent guest diary on the SANS Internet Storm Center discusses how offensive security professionals can utilize honeypot data to enhance their operations. The diary highlights the detection of scans from multiple IP addresses, emphasizing the importance of monitoring non-standard user-agent strings in web requests. https://isc.sans.edu/diary/Leveraging%20Honeypot%20Data%20for%20Offensive%20Security%20Operations%20%5BGuest%20Diary%5D/31596 Security Vulnerabilities in SimpleHelp 5.5.7 and Earlier SimpleHelp has released version 5.5.8 to address critical security vulnerabilities present in versions 5.5.7 and earlier. Users are strongly advised to upgrade to the latest version to prevent potential exploits. Detailed information and upgrade instructions are available on SimpleHelp's official website. https://simple-help.com/kb---security-vulnerabilities-01-2025#send-us-your-questions Under the Cloak of UEFI Secure Boot: Introducing CVE-2024-7344 ESET researchers have identified a new vulnerability, CVE-2024-7344, that allows attackers to bypass UEFI Secure Boot on most UEFI-based systems. This flaw enables the execution of untrusted code during system boot, potentially leading to the deployment of malicious UEFI bootkits. Affected users should apply available patches to mitigate this risk. https://www.welivesecurity.com/en/eset-research/under-cloak-uefi-secure-boot-introducing-cve-2024-7344/
AIs in Love, UEFI, Fortinet, Godaddy, Juggalos, Aaran Leyland, and More. In this edition of the Security Weekly News. Visit https://www.securityweekly.com/swn for all the latest episodes! Show Notes: https://securityweekly.com/swn-443
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
AWS DShield Sensor + DShield SIEM https://isc.sans.edu/diary/SANS%20ISC%20Internship%20Setup%3A%20AWS%20DShield%20Sensor%20%2B%20DShield%20SIEM%20%5BGuest%20Diary%5D/31480 From a Regular Infostealer to its Obfuscated Version https://isc.sans.edu/diary/From%20a%20Regular%20Infostealer%20to%20its%20Obfuscated%20Version/31484 Credit Card Skimmer Malware Targeting Magento Checkout Pages https://blog.sucuri.net/2024/11/credit-card-skimmer-malware-targeting-magento-checkout-pages.html LogoFAIL Exploited to Deploy Bootkitty, the first UEFI bootkit for Linux https://www.binarly.io/blog/logofail-exploited-to-deploy-bootkitty-the-first-uefi-bootkit-for-linux Stickers: https://isc.sans.edu/stickers.html (code PODCAST)
This week we pontificate on Gnome OS and whether KDE and Gnome really need their own distros, the much-delayed progress in Wayland development that's finally happening, and whether here's really a trend from Ubuntu back to Debian. We discuss the Pi CM5 announcement, the new UEFI bootkit announcement, and the leaked steam controller designs. For tips we have dstat for system monitoring, and SSH agent forwarding to forward your SSH keys on remote systems. The show notes are at https://bit.ly/4idT1Qb and until next week! Host: Jonathan Bennett Co-Host: Rob Campbell Want access to the video version and exclusive features? Become a member of Club TWiT today! https://twit.tv/clubtwit Club TWiT members can discuss this episode and leave feedback in the Club TWiT Discord.