Podcasts about postgressql

  • 15PODCASTS
  • 17EPISODES
  • 52mAVG DURATION
  • ?INFREQUENT EPISODES
  • Sep 5, 2023LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about postgressql

Latest podcast episodes about postgressql

Screaming in the Cloud
The Evolution of OpenTelemetry with Austin Parker

Screaming in the Cloud

Play Episode Listen Later Sep 5, 2023 40:09


Austin Parker, Community Maintainer at OpenTelemetry, joins Corey on Screaming in the Cloud to discuss OpenTelemetry's mission in the world of observability. Austin explains how the OpenTelemetry community was able to scale the OpenTelemetry project to a commercial offering, and the way Open Telemetry is driving innovation in the data space. Corey and Austin also discuss why Austin decided to write a book on OpenTelemetry, and the book's focus on the evergreen applications of the tool. About AustinAustin Parker is the OpenTelemetry Community Maintainer, as well as an event organizer, public speaker, author, and general bon vivant. They've been a part of OpenTelemetry since its inception in 2019.Links Referenced: OpenTelemetry: https://opentelemetry.io/ Learning OpenTelemetry early release: https://www.oreilly.com/library/view/learning-opentelemetry/9781098147174/ Page with Austin's social links: https://social.ap2.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Look, I get it. Folks are being asked to do more and more. Most companies don't have a dedicated DBA because that person now has a full time job figuring out which one of AWS's multiple managed database offerings is right for every workload. Instead, developers and engineers are being asked to support, and heck, if time allows, optimize their databases. That's where OtterTune comes in. Their AI is your database co-pilot for MySQL and PostgresSQL on Amazon RDS or Aurora. It helps improve performance by up to four x OR reduce costs by 50 percent – both of those are decent options. Go to ottertune dot com to learn more and start a free trial. That's O-T-T-E-R-T-U-N-E dot com.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It's been a few hundred episodes since I had Austin Parker on to talk about the things that Austin cares about. But it's time to rectify that. Austin is the community maintainer for OpenTelemetry, which is a CNCF project. If you're unfamiliar with, we're probably going to fix that in short order. Austin, Welcome back, it's been a month of Sundays.Austin: It has been a month-and-a-half of Sundays. A whole pandemic-and-a-half.Corey: So, much has happened since then. I tried to instrument something with OpenTelemetry about a year-and-a-half ago, and in defense to the project, my use case is always very strange, but it felt like—a lot of things have sharp edges, but it felt like this had so many sharp edges that you just pivot to being a chainsaw, and I would have been at least a little bit more understanding of why it hurts so very much. But I have heard from people that I trust that the experience has gotten significantly better. Before we get into the nitty-gritty of me lobbing passive-aggressive bug reports at you have for you to fix in a scenario in which you can't possibly refuse me, let's start with the beginning. What is OpenTelemetry?Austin: That's a great question. Thank you for asking it. So, OpenTelemetry is an observability framework. It is run by the CNCF, you know, home of such wonderful award-winning technologies as Kubernetes, and you know, the second biggest source of YAML in the known universe [clear throat].Corey: On some level, it feels like that is right there with hydrogen as far as unlimited resources in our universe.Austin: It really is. And, you know, as we all know, there are two things that make, sort of, the DevOps and cloud world go around: one of them being, as you would probably know, AWS bills; and the second being YAML. But OpenTelemetry tries to kind of carve a path through this, right, because we're interested in observability. And observability, for those that don't know or have been living under a rock or not reading blogs, it's a lot of things. It's a—but we can generally sort of describe it as, like, this is how you understand what your system is doing.I like to describe it as, it's a way that we can model systems, especially complex, distributed, or decentralized software systems that are pretty commonly found in larg—you know, organizations of every shape and size, quite often running on Kubernetes, quite often running in public or private clouds. And the goal of observability is to help you, you know, model this system and understand what it's doing, which is something that I think we can all agree, a pretty important part of our job as software engineers. Where OpenTelemetry fits into this is as the framework that helps you get the telemetry data you need from those systems, put it into a universal format, and then ship it off to some observability back-end, you know, a Prometheus or a Datadog or whatever, in order to analyze that data and get answers to your questions you have.Corey: From where I sit, the value of OTel—or OpenTelemetry; people in software engineering love abbreviations that are impenetrable from the outside, so of course, we're going to lean into that—but what I found for my own use case is the shining value prop was that I could instrument an application with OTel—in theory—and then send whatever I wanted that was emitted in terms of telemetry, be it events, be it logs, be it metrics, et cetera, and send that to any or all of a curation of vendors on a case-by-case basis, which meant that suddenly it was the first step in, I guess, an observability pipeline, which increasingly is starting to feel like a milit—like an industrial-observability complex, where there's so many different companies out there, it seems like a good approach to use, to start, I guess, racing vendors in different areas to see which performs better. One of the challenges I've had with that when I started down that path is it felt like every vendor who was embracing OTel did it from a perspective of their implementation. Here's how to instrument it to—send it to us because we're the best, obviously. And you're a community maintainer, despite working at observability vendors yourself. You have always been one of those community-first types where you care more about the user experience than you do this quarter for any particular employer that you have, which to be very clear, is intended as a compliment, not a terrifying warning. It's why you have this authentic air to you and why you are one of those very few voices that I trust in a space where normally I need to approach it with significant skepticism. How do you see the relationship between vendors and OpenTelemetry?Austin: I think the hard thing is that I know who signs my paychecks at the end of the day, right, and you always have, you know, some level of, you know, let's say bias, right? Because it is a bias to look after, you know, them who brought you to the dance. But I think you can be responsible with balancing, sort of, the needs of your employer, and the needs of the community. You know, the way I've always described this is that if you think about observability as, like, a—you know, as a market, what's the total addressable market there? It's literally everyone that uses software; it's literally every software company.Which means there's plenty of room for people to make their numbers and to buy and sell and trade and do all this sort of stuff. And by taking that approach, by taking sort of the big picture approach and saying, “Well, look, you know, there's going to be—you know, of all these people, there are going to be some of them that are going to use our stuff and there are some of them that are going to use our competitor's stuff.” And that's fine. Let's figure out where we can invest… in an OpenTelemetry, in a way that makes sense for everyone and not just, you know, our people. So, let's build things like documentation, right?You know, one of the things I'm most impressed with, with OpenTelemetry over the past, like, two years is we went from being, as a project, like, if you searched for OpenTelemetry, you would go and you would get five or six or ten different vendor pages coming up trying to tell you, like, “This is how you use it, this is how you use it.” And what we've done as a community is we've said, you know, “If you go looking for documentation, you should find our website. You should find our resources.” And we've managed to get the OpenTelemetry website to basically rank above almost everything else when people are searching for help with OpenTelemetry. And that's been really good because, one, it means that now, rather than vendors or whoever coming in and saying, like, “Well, we can do this better than you,” we can be like, “Well, look, just, you know, put your effort here, right? It's already the top result. It's already where people are coming, and we can prove that.”And two, it means that as people come in, they're going to be put into this process of community feedback, where they can go in, they can look at the docs, and they can say, “Oh, well, I had a bad experience here,” or, “How do I do this?” And we get that feedback and then we can improve the docs for everyone else by acting on that feedback, and the net result of this is that more people are using OpenTelemetry, which means there are more people kind of going into the tippy-tippy top of the funnel, right, that are able to become a customer of one of these myriad observability back ends.Corey: You touched on something very important here, when I first was exploring this—you may have been looking over my shoulder as I went through this process—my impression initially was, oh, this is a ‘CNCF project' in quotes, where—this is not true universally, of course, but there are cases where it clearly—is where this is an, effectively, vendor-captured project, not necessarily by one vendor, but by an almost consortium of them. And that was my takeaway from OpenTelemetry. It was conversations with you, among others, that led me to believe no, no, this is not in that vein. This is clearly something that is a win. There are just a whole bunch of vendors more-or-less falling all over themselves, trying to stake out thought leadership and imply ownership, on some level, of where these things go. But I definitely left with a sense that this is bigger than any one vendor.Austin: I would agree. I think, to even step back further, right, there's almost two different ways that I think vendors—or anyone—can approach OpenTelemetry, you know, from a market perspective, and one is to say, like, “Oh, this is socializing, kind of, the maintenance burden of instrumentation.” Which is a huge cost for commercial players, right? Like, if you're a Datadog or a Splunk or whoever, you know, you have these agents that you go in and they rip telemetry out of your web servers, out of your gRPC libraries, whatever, and it costs a lot of money to pay engineers to maintain those instrumentation agents, right? And the cynical take is, oh, look at all these big companies that are kind of like pushing all that labor onto the open-source community, and you know, I'm not casting any aspersions here, like, I do think that there's an element of truth to it though because, yeah, that is a huge fixed cost.And if you look at the actual lived reality of people and you look at back when SignalFx was still a going concern, right, and they had their APM agents open-sourced, you could go into the SignalFx repo and diff, like, their [Node Express 00:10:15] instrumentation against the Datadog Node Express instrumentation, and it's almost a hundred percent the same, right? Because it's truly a commodity. There's no—there's nothing interesting about how you get that telemetry out. The interesting stuff all happens after you have the telemetry and you've sent it to some back-end, and then you can, you know, analyze it and find interesting things. So, yeah, like, it doesn't make sense for there to be five or six or eight different companies all competing to rebuild the same wheels over and over and over and over when they don't have to.I think the second thing that some people are starting to understand is that it's like, okay, let's take this a step beyond instrumentation, right? Because the goal of OpenTelemetry really is to make sure that this instrumentation is native so that you don't need a third-party agent, you don't need some other process or jar or whatever that you drop in and it instruments stuff for you. The JVM should provide this, your web framework should provide this, your RPC library should provide this right? Like, this data should come from the code itself and be in a normalized fashion that can then be sent to any number of vendors or back ends or whatever. And that changes how—sort of, the competitive landscape a lot, I think, for observability vendors because rather than, kind of, what you have now, which is people will competing on, like, well, how quickly can I throw this agent in and get set up and get a dashboard going, it really becomes more about, like, okay, how are you differentiating yourself against every other person that has access to the same data, right? And you get more interesting use cases and how much more interesting analysis features, and that results in more innovation in, sort of, the industry than we've seen in a very long time.Corey: For me, just from the customer side of the world, one of the biggest problems I had with observability in my career as an SRE-type for years was you would wind up building your observability pipeline around whatever vendor you had selected and that meant emphasizing the things they were good at and de-emphasizing the things that they weren't. And sometimes it's worked to your benefit; usually not. But then you always had this question when it got things that touched on APM or whatnot—or Application Performance Monitoring—where oh, just embed our library into this. Okay, great. But a year-and-a-half ago, my exposure to this was on an application that I was running in distributed fashion on top of AWS Lambda.So great, you can either use an extension for this or you can build in the library yourself, but then there's always a question of precedence where when you have multiple things that are looking at this from different points of view, which one gets done first? Which one is going to see the others? Which one is going to enmesh the other—enclose the others in its own perspective of the world? And it just got incredibly frustrating. One of the—at least for me—bright lights of OTel was that it got away from that where all of the vendors receiving telemetry got the same view.Austin: Yeah. They all get the same view, they all get the same data, and you know, there's a pretty rich collection of tools that we're starting to develop to help you build those pipelines yourselves and really own everything from the point of generation to intermediate collection to actually outputting it to wherever you want to go. For example, a lot of really interesting work has come out of the OpenTelemetry collector recently; one of them is this feature called Connectors. And Connectors let you take the output of certain pipelines and route them as inputs to another pipeline. And as part of that connection, you can transform stuff.So, for example, let's say you have a bunch of [spans 00:14:05] or traces coming from your API endpoints, and you don't necessarily want to keep all those traces in their raw form because maybe they aren't interesting or maybe there's just too high of a volume. So, with Connectors, you can go and you can actually convert all of those spans into metrics and export them to a metrics database. You could continue to save that span data if you want, but you have options now, right? Like, you can take that span data and put it into cold storage or put it into, like, you know, some sort of slow blob storage thing where it's not actively indexed and it's slow lookups, and then keep a metric representation of it in your alerting pipeline, use metadata exemplars or whatever to kind of connect those things back. And so, when you do suddenly see it's like, “Oh, well, there's some interesting p99 behavior,” or we're hitting an alert or violating an SLO or whatever, then you can go back and say, like, “Okay, well, let's go dig through the slow da—you know, let's look at the cold data to figure out what actually happened.”And those are features that, historically, you would have needed to go to a big, important vendor and say, like, “Hey, here's a bunch of money,” right? Like, “Do this for me.” Now, you have the option to kind of do all that more interesting pipeline stuff yourself and then make choices about vendors based on, like, who is making a tool that can help me with the problem that I have? Because most of the time, I don't—I feel like we tend to treat observability tools as—it depends a lot on where you sit in the org—but you certainly seen this movement towards, like, “Well, we don't want a tool; we want a platform. We want to go to Lowe's and we want to get the 48-in-one kit that has a bunch of things in it. And we're going to pay for the 48-in-one kit, even if we only need, like, two things or three things out of it.”OpenTelemetry lets you kind of step back and say, like, “Well, what if we just got, like, really high-quality tools for the two or three things we need, and then for the rest of the stuff, we can use other cheaper options?” Which is, I think, really attractive, especially in today's macroeconomic conditions, let's say.Corey: One thing I'm trying to wrap my head around because we all find when it comes to observability, in my experience, it's the parable of three blind people trying to describe an elephant by touch; depending on where you are on the elephant, you have a very different perspective. What I'm trying to wrap my head around is, what is the vision for OpenTelemetry? Is it specifically envisioned to be the agent that runs wherever the workload is, whether it's an agent on a host or a layer in a Lambda function, or a sidecar or whatnot in a Kubernetes cluster that winds up gathering and sending data out? Or is the vision something different? Because part of what you're saying aligns with my perspective on it, but other parts of it seem to—that there's a misunderstanding somewhere, and it's almost certainly on my part.Austin: I think the long-term vision is that you as a developer, you as an SRE, don't even have to think about OpenTelemetry, that when you are using your container orchestrator or you are using your API framework or you're using your Managed API Gateway, or any kind of software that you're building something with, that the telemetry data from that software is emitted in an OpenTelemetry format, right? And when you are writing your code, you know, and you're using gRPC, let's say, you could just natively expect that OpenTelemetry is kind of there in the background and it's integrated into the actual libraries themselves. And so, you can just call the OpenTelemetry API and it's part of the standard library almost, right? You add some additional metadata to a span and say, like, “Oh, this is the customer ID,” or, “This is some interesting attribute that I want to track for later on,” or, “I'm going to create a histogram here or counter,” whatever it is, and then all that data is just kind of there, right, invisible to you unless you need it. And then when you need it, it's there for you to kind of pick up and send off somewhere to any number of back-ends or databases or whatnot that you could then use to discover problems or better model your system.That's the long-term vision, right, that it's just there, everyone uses it. It is a de facto and du jour standard. I think in the medium term, it does look a little bit more like OpenTelemetry is kind of this Swiss army knife agent that's running on—inside cars in Kubernetes or it's running on your EC2 instance. Until we get to the point of everyone just agrees that we're going to use OpenTelemetry protocol for the data and we're going to use all your stuff and we just natively emit it, then that's going to be how long we're in that midpoint. But that's sort of the medium and long-term vision I think. Does that track?Corey: It does. And I'm trying to equate this to—like the evolution back in the Stone Age was back when I was first getting started, Nagios was the gold standard. It was kind of the original Call of Duty. And it was awful. There were a bunch of problems with it, but it also worked.And I'm not trying to dunk on the people who built that. We all stand on the shoulders of giants. It was an open-source project that was awesome doing exactly what it did, but it was a product built for a very different time. It completely had the wheels fall off as soon as you got to things were even slightly ephemeral because it required this idea of the server needed to know where all of the things that was monitoring lived as an individual host basis, so there was this constant joy of, “Oh, we're going to add things to a cluster.” Its perspective was, “What's a cluster?” Or you'd have these problems with a core switch going down and suddenly everything else would explode as well.And even setting up an on-call rotation for who got paged when was nightmarish. And a bunch of things have evolved since then, which is putting it mildly. Like, you could say that about fire, the invention of the wheel. Yeah, a lot of things have evolved since the invention of the wheel, and here we are tricking sand into thinking. But we find ourselves just—now it seems that the outcome of all of this has been instead of one option that's the de facto standard that's kind of terrible in its own ways, now, we have an entire universe of different products, many of which are best-of-breed at one very specific thing, but nothing's great at everything.It's the multifunction printer conundrum, where you find things that are great at one or two things at most, and then mediocre at best at the rest. I'm excited about the possibility for OpenTelemetry to really get to a point of best-of-breed for everything. But it also feels like the money folks are pushing for consolidation, if you believe a lot of the analyst reports around this of, “We already pay for seven different observability vendors. How about we knock it down to just one that does all of these things?” Because that would be terrible. What do you land on that?Austin: Well, as I intu—or alluded to this earlier, I think the consolidation in the observability space, in general, is very much driven by that force you just pointed out, right? The buyers want to consolidate more and more things into single tools. And I think there's a lot of… there are reasons for that that—you know, there are good reasons for that, but I also feel like a lot of those reasons are driven by fundamentally telemetry-side concerns, right? So like, one example of this is if you were Large Business X, and you see—you are an engineering director and you get a report, that's like, “We have eight different metrics products.” And you're like, “That seems like a lot. Let's just use Brand X.”And Brand X will tell you very, very happily tell you, like, “Oh, you just install our thing everywhere and you can get rid of all these other tools.” And usually, there's two reasons that people pick tools, right? One reason is that they are forced to and then they are forced to do a bunch of integration work to get whatever the old stuff was working in the new way, but the other reason is because they tried a bunch of different things and they found the one tool that actually worked for them. And what happens invariably in these sort of consolidation stories is, you know, the new vendor comes in on a shining horse to consolidate, and you wind up instead of eight distinct metrics tools, now you have nine distinct metrics tools because there's never any bandwidth for people to go back and, you know—you're Nagios example, right, Nag—people still use Nagios every day. What's the economic justification to take all those Nagios installs, if they're working, and put them into something else, right?What's the economic justification to go and take a bunch of old software that hasn't been touched for ten years that still runs and still does what needs to do, like, where's the incentive to go and re-instrument that with OpenTelemetry or anything else? It doesn't necessarily exist, right? And that's a pretty, I think, fundamental decision point in everyone's observability journey, which is what do you do about all the old stuff? Because most of the stuff is the old stuff and the worst part is, most of the stuff that you make money off of is the old stuff as well. So, you can't ignore it, and if you're spending, you know, millions of millions of dollars on the new stuff—like, there was a story that went around a while ago, I think, Coinbase spent something like, what, $60 million on Datadog… I hope they asked for it in real money and not Bitcoin. But—Corey: Yeah, something I've noticed about all the vendors, and even Coinbase themselves, very few of them actually transact in cryptocurrency. It's always cash on the barrelhead, so to speak.Austin: Yeah, smart. But still, like, that's an absurd amount of money [laugh] for any product or service, I would argue, right? But that's just my perspective. I do think though, it goes to show you that you know, it's very easy to get into these sort of things where you're just spending over the barrel to, like, the newest vendor that's going to come in and solve all your problems for you. And just, it often doesn't work that way because most places aren't—especially large organizations—just aren't built in is sort of like, “Oh, we can go through and we can just redo stuff,” right? “We can just roll out a new agent through… whatever.”We have mainframes [unintelligible 00:25:09], mainframes to thinking about, you have… in many cases, you have an awful lot of business systems that most, kind of, cloud people don't like, think about, right, like SAP or Salesforce or ServiceNow, or whatever. And those sort of business process systems are actually responsible for quite a few things that are interesting from an observability point of view. But you don't see—I mean, hell, you don't even see OpenTelemetry going out and saying, like, “Oh, well, here's the thing to let you know, observe Apex applications on Salesforce,” right? It's kind of an undiscovered country in a lot of ways and it's something that I think we will have to grapple with as we go forward. In the shorter term, there's a reason that OpenTelemetry mostly focuses on cloud-native applications because that's a little bit easier to actually do what we're trying to do on them and that's where the heat and light is. But once we get done with that, then the sky is the limit.[midroll 00:26:11]Corey: It still feels like OpenTelemetry is evolving rapidly. It's certainly not, I don't want to say it's not feature complete, which, again, what—software is never done. But it does seem like even quarter-to-quarter or month-to-month, its capabilities expand massively. Because you apparently enjoy pain, you're in the process of writing a book. I think it's in early release or early access that comes out next year, 2024. Why would you do such a thing?Austin: That's a great question. And if I ever figure out the answer I will tell you.Corey: Remember, no one wants to write a book; they want to have written the book.Austin: And the worst part is, is I have written the book and for some reason, I went back for another round. I—Corey: It's like childbirth. No one remembers exactly how horrible it was.Austin: Yeah, my partner could probably attest to that. Although I was in the room, and I don't think I'd want to do it either. So, I think the real, you know, the real reason that I decided to go and kind of write this book—and it's Learning OpenTelemetry; it's in early release right now on the O'Reilly learning platform and it'll be out in print and digital next year, I believe, we're targeting right now, early next year.But the goal is, as you pointed out so eloquently, OpenTelemetry changes a lot. And it changes month to month sometimes. So, why would someone decide—say, “Hey, I'm going to write the book about learning this?” Well, there's a very good reason for that and it is that I've looked at a lot of the other books out there on OpenTelemetry, on observability in general, and they talk a lot about, like, here's how you use the API. Here's how you use the SDK. Here's how you make a trace or a span or a log statement or whatever. And it's very technical; it's very kind of in the weeds.What I was interested in is saying, like, “Okay, let's put all that stuff aside because you don't necessarily…” I'm not saying any of that stuff's going to change. And I'm not saying that how to make a span is going to change tomorrow; it's not, but learning how to actually use something like OpenTelemetry isn't just knowing how to create a measurement or how to create a trace. It's, how do I actually use this in a production system? To my point earlier, how do I use this to get data about, you know, these quote-unquote, “Legacy systems?” How do I use this to monitor a Kubernetes cluster? What's the important parts of building these observability pipelines? If I'm maintaining a library, how should I integrate OpenTelemetry into that library for my users? And so on, and so on, and so forth.And the answers to those questions actually probably aren't going to change a ton over the next four or five years. Which is good because that makes it the perfect thing to write a book about. So, the goal of Learning OpenTelemetry is to help you learn not just how to use OpenTelemetry at an API or SDK level, but it's how to build an observability pipeline with OpenTelemetry, it's how to roll it out to an organization, it's how to convince your boss that this is what you should use, both for new and maybe picking up some legacy development. It's really meant to give you that sort of 10,000-foot view of what are the benefits of this, how does it bring value and how can you use it to build value for an observability practice in an organization?Corey: I think that's fair. Looking at the more quote-unquote, “Evergreen,” style of content as opposed to—like, that's the reason for example, I never wind up doing tutorials on how to use an AWS service because one console change away and suddenly I have to redo the entire thing. That's a treadmill I never had much interest in getting on. One last topic I want to get into before we wind up wrapping the episode—because I almost feel obligated to sprinkle this all over everything because the analysts told me I have to—what's your take on generative AI, specifically with an eye toward observability?Austin: [sigh], gosh, I've been thinking a lot about this. And—hot take alert—as a skeptic of many technological bubbles over the past five or so years, ten years, I'm actually pretty hot on AI—generative AI, large language models, things like that—but not for the reasons that people like to kind of hold them up, right? Not so that we can all make our perfect, funny [sigh], deep dream, meme characters or whatever through Stable Fusion or whatever ChatGPT spits out at us when we ask for a joke. I think the real win here is that this to me is, like, the biggest advance in human-computer interaction since resistive touchscreens. Actually, probably since the mouse.Corey: I would agree with that.Austin: And I don't know if anyone has tried to get someone that is, you know, over the age of 70 to use a computer at any time in their life, but mapping human language to trying to do something on an operating system or do something on a computer on the web is honestly one of the most challenging things that faces interface design, face OS designers, faces anyone. And I think this also applies for dev tools in general, right? Like, if you think about observability, if you think about, like, well, what are the actual tasks involved in observability? It's like, well, you're making—you're asking questions. You're saying, like, “Hey, for this metric named HTTPrequestsByCode,” and there's four or five dimensions, and you say, like, “Okay, well break this down for me.” You know, you have to kind of know the magic words, right? You have to know the magic promQL sequence or whatever else to plug in and to get it to graph that for you.And you as an operator have to have this very, very well developed, like, depth of knowledge and math and statistics to really kind of get a lot of—Corey: You must be at least this smart to ride on this ride.Austin: Yeah. And I think that, like that, to me is the real—the short-term win for certainly generative AI around using, like, large language models, is the ability to create human language interfaces to observability tools, that—Corey: As opposed to learning your own custom SQL dialect, which I see a fair number of times.Austin: Right. And, you know, and it's actually very funny because there was a while for the—like, one of my kind of side projects for the past [sigh] a little bit [unintelligible 00:32:31] idea of, like, well, can we make, like, a universal query language or universal query layer that you could ship your dashboards or ship your alerts or whatever. And then it's like, generative AI kind of just, you know, completely leapfrogs that, right? It just says, like, well, why would you need a query language, if we can just—if you can just ask the computer and it works, right?Corey: The most common programming language is about to become English.Austin: Which I mean, there's an awful lot of externalities there—Corey: Which is great. I want to be clear. I'm not here to gatekeep.Austin: Yeah. I mean, I think there's a lot of externalities there, and there's a lot—and the kind of hype to provable benefit ratio is very skewed right now towards hype. That said, one of the things that is concerning to me as sort of an observability practitioner is the amount of people that are just, like, whole-hog, throwing themselves into, like, oh, we need to integrate generative AI, right? Like, we need to put AI chatbots and we need to have ChatGPT built into our products and da-da-da-da-da. And now you kind of have this perfect storm of people that really don't ha—because they're just using these APIs to integrate gen AI stuff with, they really don't understand what it's doing because a lot you know, it is very complex, and I'll be the first to admit that I really don't understand what a lot of it is doing, you know, on the deep, on the foundational math side.But if we're going to have trust in, kind of, any kind of system, we have to understand what it's doing, right? And so, the only way that we can understand what it's doing is through observability, which means it's incredibly important for organizations and companies that are building products on generative AI to, like, drop what—you know, walk—don't walk, run towards something that is going to give you observability into these language models.Corey: Yeah. “The computer said so,” is strangely dissatisfying.Austin: Yeah. You need to have that base, you know, sort of, performance [goals and signals 00:34:31], obviously, but you also need to really understand what are the questions being asked. As an example, let's say you have something that is tokenizing questions. You really probably do want to have some sort of observability on the hot path there that lets you kind of break down common tokens, especially if you were using, like, custom dialects or, like, vectors or whatever to modify the, you know, neural network model, like, you really want to see, like, well, what's the frequency of the certain tokens that I'm getting they're hitting the vectors versus not right? Like, where can I improve these sorts of things? Where am I getting, like, unexpected results?And maybe even have some sort of continuous feedback mechanism that it could be either analyzing the tone and tenor of end-user responses or you can have the little, like, frowny and happy face, whatever it is, like, something that is giving you that kind of constant feedback about, like, hey, this is how people are actually like interacting with it. Because I think there's way too many stories right now people just kind of like saying, like, “Oh, okay. Here's some AI-powered search,” and people just, like, hating it. Because people are already very primed to distrust AI, I think. And I can't blame anyone.Corey: Well, we've had an entire lifetime of movies telling us that's going to kill us all.Austin: Yeah.Corey: And now you have a bunch of, also, billionaire tech owners who are basically intent on making that reality. But that's neither here nor there.Austin: It isn't, but like I said, it's difficult. It's actually one of the first times I've been like—that I've found myself very conflicted.Corey: Yeah, I'm a booster of this stuff; I love it, but at the same time, you have some of the ridiculous hype around it and the complete lack of attention to safety and humanity aspects of it that it's—I like the technology and I think it has a lot of promise, but I want to get lumped in with that set.Austin: Exactly. Like, the technology is great. The fan base is… ehh, maybe something a little different. But I do think that, for lack of a better—not to be an inevitable-ist or whatever, but I do think that there is a significant amount of, like, this is a genie you can't put back in the bottle and it is going to have, like, wide-ranging, transformative effects on the discipline of, like, software development, software engineering, and white collar work in general, right? Like, there's a lot of—if your job involves, like, putting numbers into Excel and making pretty spreadsheets, then ooh, that doesn't seem like something that's going to do too hot when I can just have Excel do that for me.And I think we do need to be aware of that, right? Like, we do need to have that sort of conversation about, like… what are we actually comfortable doing here in terms of displacing human labor? When we do displace human labor, are we doing it so that we can actually give people leisure time or so that we can just cram even more work down the throats of the humans that are left?Corey: And unfortunately, I think we might know what that answer is, at least on our current path.Austin: That's true. But you know, I'm an optimist.Corey: I… don't do well with disappointment. Which the show has certainly not been. I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?Austin: Welp, I—you can find me on most social media. Many, many social medias. I used to be on Twitter a lot, and we all know what happened there. The best place to figure out what's going on is check out my bio, social.ap2.io will give you all the links to where I am. And yeah, been great talking with you.Corey: Likewise. Thank you so much for taking the time out of your day. Austin Parker, community maintainer for OpenTelemetry. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment pointing out that actually, physicists say the vast majority of the universe's empty space, so that we can later correct you by saying ah, but it's empty whitespace. That's right. YAML wins again.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Screaming in the Cloud
How Redpanda Extracts Business Value from Data Events with Alex Gallego

Screaming in the Cloud

Play Episode Listen Later Aug 31, 2023 34:43


Alex Gallego, CEO & Founder of Redpanda, joins Corey on Screaming in the Cloud to discuss his experience founding and scaling a successful data streaming company over the past 4 years. Alex explains how it's been a fun and humbling journey to go from being an engineer to being a founder, and how he's built a team he trusts to hand the production off to. Corey and Alex discuss the benefits and various applications of Redpanda's data streaming services, and Alex reveals why it was so important to him to focus on doing one thing really well when it comes to his product strategy. Alex also shares details on the Hack the Planet scholarship program he founded for individuals in underrepresented communities. About AlexAlex Gallego is the founder and CEO of Redpanda, the streaming data platform for developers. Alex has spent his career immersed in deeply technical environments, and is passionate about finding and building solutions to the challenges of modern data streaming. Prior to Redpanda, Alex was a principal engineer at Akamai, as well as co-founder and CTO of Concord.io, a high-performance stream-processing engine acquired by Akamai in 2016. He has also engineered software at Factset Research Systems, Forex Capital Markets and Yieldmo; and holds a bachelor's degree in computer science and cryptography from NYU. Links Referenced: Redpanda: https://redpanda.com/ Twitter: https://twitter.com/emaxerrno Redpanda community Slack: https://redpandacommunity.slack.com/join/shared_invite/zt-1xq6m0ucj-nI41I7dXWB13aQ2iKBDvDw Hack The Planet Scholarship: https://redpanda.com/scholarship TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tired of slow database performance and bottlenecks on MySQL or PostgresSQL when using Amazon RDS or Aurora? How'd you like to reduce query response times by ninety percent? Better yet, how would you like to get me to pronounce database names correctly? Join customers like Zscaler, Intel, Booking.com, and others that use OtterTune's artificial intelligence to automatically optimize and keep their databases healthy. Go to ottertune dot com to learn more and start a free trial. That's O-T-T-E-R-T-U-N-E dot com.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn, and this promoted guest episode is brought to us by our friends at Redpanda, which I'm thrilled about because I have a personal affinity for companies that have cartoon mascots in the form of animals and are willing to at least be slightly creative with them. My guest is Alex Gallego, the founder and CEO over at Redpanda. Alex, thanks for joining me.Alex: Corey, thanks for having me.Corey: So, I'm not asking about the animal; I'm talking about the company, which I imagine is a frequent source of disambiguation when you meet people at parties and they don't quite understand what it is that you do. And you folks are big in the data streaming space, but data streaming can mean an awful lot of things to an awful lot of people. What is it for you?Alex: Largely it's about enabling developers to build applications that can extract value of every single event, every click, every mouse movement, every transaction, every event that goes through your network. This is what Redpanda is about. It's like how do we help you make more money with every single event? How do we help you be more successful? And you know, happy to give examples in finance, or IoT, or oil and gas, if it's helpful for the audience, but really, to me, it's like, okay, if we can give you the framework in which you can build a new application that allows you to extract value out of data, every single event that's going through your network, to me, that's what a streaming is about. It large, it's you know, data contextualized with a timestamp and largely, a sort of a database of event streaming.Corey: One of the things that I find curious about the space is that usually, companies wind up going one of two directions when you're talking about data streaming. Either there, “Oh, just send it all to us and we'll take care of it for you,” or otherwise, it's a, great they more or less ship something that you've run in your own environment. In the olden days of data centers, that usually resembled a box of some sort. You're one of those interesting split-the-difference companies where you offer both models. Do you find that one of those tends to be seeing more adoption these days or that there's an increasing trend toward one direction or the other?Alex: Yeah. So, right now, I think that to me, the future of all these data-intensive products—whether you're a database or a streaming engine—will, because simply of cost of networks transferred between the hybrid clouds and your accounts, sending a gigabyte a second of data between, let's say, you know, your data center and a vendor, it's just so expensive that at some point, from just a cost perspective, like, running the infrastructure, it's in the millions of dollars. And so, running the data inside your VPC, it's sort of the next logical evolution of how we've used to consume services. And so, I actually think it's just the evolution: people would self-host because of costs and then they would use services because of operational simplicity. “I don't want to spend team skills and time building this. I want to pay a vendor.”And so, BYOC, to be honest—which is what we call this offering—it was about [laugh] sidestepping the costs and of being stuck in the hybrid clouds, whether it's Google or Amazon, where you're paying egress and ingress costs and it's just so expensive, in addition to this whole idea of data residency or data sovereignty and privacy. It's like, yeah, why not both? Like, if I'm an engineer, I want low latency and I don't want to pay you to transfer this thing to the next rack. I mean, my computer's probably, like, you know, a hundred feet away from my customer's computer. Like, why [laugh] way is that so complicated? So, you know, my view is that the future of data-intensive products will be in this form of where it—like, data planes are actually owned by companies, and then you offer that as a Software as a Service.Corey: One of the things that catches an awful lot of companies with telemetry use cases—or data streaming as another example of that—by surprise when they start building their own cloud-hosted offering is that they're suddenly seeing a lot more cross-AZ data charges than they would have potentially expected. And that's because unlike cross-region or the really expensive version of this with egress, it's a penny in and a penny out per gigabyte in most of AWS regions. Which means that that isn't also bound strictly to an AWS organization. So, you have customers co-located with you and you're starting to pay ingress charges on customers throwing their data over to you. And, on some level, the most economical solution for you is well, we're just going to put our listeners somewhere else far away so that we can just have them pay the steep egress fee but then we can just reflect it back to ourselves for free.And that's a terrible pattern, but it's a byproduct of the absolutely byzantine cross-AZ data transfer pricing, in fact, all of the data transfer pricing that is at least AWS tends to present. And it shapes the architectural decisions you make as a result.Alex: You know, as a user, it just didn't make sense. When we launched this product, the number of people that says like, “Why wouldn't your charge for, you know, effectively renting [unintelligible 00:05:14], and giving a markup to your customers?” That's we don't add any value on that, you know? I think people should really just pay us for the value that we create for them. And so, you know, for us competing with other companies is relatively easy.Competing with MSK is it's harder because MSK just has this, you know, muscle where they don't charge you for some particular network traffic between you. And so, it forces companies like us that are trying to be innovative in the data space to, like, put our services in that so that we can actually compete in the market. And so, it's a forcing function of the hybrid clouds having this strong muscle of being able to discount their services in a way that companies just simply don't have access to. And then, you know, it becomes—for the others—latency and sovereignty.Corey: This is the way that effectively all of AWS has first-party offerings of other things go. Replication traffic between AZs is not chargeable. And when I asked them about that, they say, “Oh, yeah. We just price that into the cost of the service.” I don't know that I necessarily buy that because if I try and run this sort of thing on top of EC2, it would cost me more than using their crappy implementation of it, just in data transfer alone for an awful lot of use cases.No third party can touch that level of cost-effectiveness and discounting. It really is probably the clearest example I can think of actual anti-competitive behavior in the market. But it's also complex enough to explain, to, you know, regulators that it doesn't make for exciting exposés and the basis for lawsuits. Yet. Hope springs eternal.Alex: [laugh]. You know—okay, so here is how—if someone is listening to this podcast and is, like, “Okay, well, what can I do?” For us, S3 is the answer. S3 is basically you need to be able to lean in into S3 as a way of replication across [AZ 00:06:56], you need to be able to lean into S3 to read data. And so actually, when I wrote, originally, Redpanda, you know, it's just like this C++ thing using [unintelligible 00:07:04], geared towards super low latency.When we moved it into the cloud, what we realized is, this is cost prohibitive to run either on EBS volumes or local disk. I have to tier all the storage into S3, so that I can use S3's cross-AZ network transfer, which is basically free, to be able to then bring a separate cluster on a different AZ, and then read from the bucket at zero cost. And so, you end up really—like, there are fundamental technical things that you have to do to just be able to compete in a way that's cost-effective for you. And so, in addition to just, like, the muscle that they can enforce on the companies is—it—there are deep implications of what it translates to at the technical level. Like, at the code level.Corey: In the cloud, more than almost anywhere else, it really does become apparent that cost and architecture are fundamentally the same thing. And I have a bit of an advantage here in that I've seen what you do deployed at least one customer of mine. It's fun. When you have a bunch of logos on your site, it's, “Hey, I recognize some of those.” And what I found interesting was the way that multiple people, when I spoke to them, described what it is that you do because some of them talked about it purely as a cost play, but other people were just as enthusiastic about it being a means of improving feature velocity and unlocking capabilities that they didn't otherwise have or couldn't have gotten to without a whole lot of custom work on their part. Which is it? How do you view what it is that you're bringing to market? Is it a cost play or is it a capability story?Alex: From our customer base, I would say 40% is—of our customer base—is about Redpanda enabling them to do things that they simply couldn't do before. An example is, we have, you know, a Fortune 100 company that they basically run their hedge trading strategy on top of Redpanda. And the reason for that is because we give them a five-millisecond average latency with predictable flight latencies, right? And so, for them, that predictability of Redpanda, you know, and sort of like the architecture that came about from trying to invent a new storage engine, allows them to throw away a bunch of in-house, you know, custom-built pub/sub messaging that, you know, basically gave them the same or worse latency. And so, for them, there's that.For others, I think in the IoT space, or if you have flying vehicles around the world, we have some logos that, you know, I just can't mention them. But they have this, like, flying computers around the world and they want to measure that. And so, like, the profile of the footprint, like, the mechanical footprint of being able to run on a single Pthread with a few megs of memory allows these new deployment models that, you know, simply, it's just, it's not possible with the alternatives where let's say you have to have, you know, like, a zookeeper on the schema registry and an HTTP proxy and a broker and all of these things. That simply just, it cannot run on a single Pthread with a few megs of memory, if you put any sort of workload into that. And so, it's like, the computational efficiencies simply enable new things that you couldn't do before. And that's probably 40%. And then the other, it's just… money was really cheap last year [laugh] or the year before and I think now it's less cheap [unintelligible 00:10:08] yeah.Corey: Yeah, I couldn't help but notice that in my own business, too. It turns out that not giving a shit about the AWS bill was a zero-interest-rate phenomenon. Who knew?Alex: [laugh]. Yeah, exactly. And now people [unintelligible 00:10:17], you know, the CIOs in particular, it's like, help. And so, that's really 60%, and our business has boomed since.Corey: Yeah, one thing that I find interesting is that you've been around for only four years. I know that's weird to say ‘only,' but time moves differently in tech. And you've started showing up in some very strange places that I would not have expected. You recently—somewhat recently; time is, of course, a flat circle—completed $100 million Series C, and I also saw you in places where I didn't expect to see you in the form of, last week, one of your large competitor's earnings calls, where they were asked by an analyst about an unnamed company that had raised $100 million Series C, and the CEO [unintelligible 00:11:00], “Oh, you're probably talking about Redpanda.” And then they gave an answer that was fine.I mean, no one is going to be on an earnings call and not be prepared for questions like that and to not have an answer ready to go. No one's going to say, “Well, we're doomed if it works,” because I think that businesses are more sophisticated than that. But it was an interesting shout-out in a place where you normally don't see competitors validate that you're doing something interesting by name-checking you.Alex: What was fundamentally interesting for me about that, is that I feel that as an investor, if you're putting you know, 2, 3, 4, or $500 million check into a public position of a company, you want to know, is this money simply going to make returns? That's basically what an investor cares about. And so, the reason for that question is, “Hey, there's a Series C startup company that now has a bunch of these Fortune 2000 logos,” and you know, when we talked to them, like, their customer [unintelligible 00:11:51] phenomena, like, why is that the case? And then, you know, our competitor was forced to name, you know, [laugh] a single win. That's as far as I remember it. We don't know of any additional customers that have switched to that.And so, I think when you have, like, you know, your win rate is above, whatever, 95%, 97% ratio, then I think, you know, they're just sort of forced to answer that. And in a way, I just think that they focus on different things. And for me, it was like, “Okay, developer, hands on keyboard, behind the terminal, how do I make you successful?” And that seems to have worked out enough to be mentioned in the earnings call.Corey: On some level, it's a little bit of a dog-and-pony show. I think that as companies had a certain point of scale, they feel that they need to validate what they're doing to investors at various points—which is always, on some level, of concern—and validate themselves to analysts, both financial—which, okay, whatever—and also, industry analysts, where they come with checklists that they believe is what customers want and is often a little bit off of the mark. But the validation that I think that matters, that actually determines whether or not something has legs is what your customers—you know, people paying you money for a thing—have to say and what they take away from what you're doing. And having seen in a couple of cases now myself, that usage of Redpanda has increased after initial proofs of concept and putting things on to it, I already sort of know the answer to this, but it seems that you also have a vibrant community of boosters for people who are thrilled to use the thing you're selling them.Alex: You know, Jumptraders recently posted that there was a use case in the new stack where they, like, put for the most mission-critical. So, for those of you that listening, Jumptraders is financial company, and they're super technical company. One of, like, the hardest things, they'll probably put your [unintelligible 00:13:35] your product through some of the most rigorous testing [unintelligible 00:13:38]. So, when you start doing some of these logos, it gives confidence. And actually, the majority of our developers that we get to partner with, it was really a friend telling a friend, for [laugh] the longest time, my marketing department was super, super small.And then what's been fun, some, like, really different use case was the one I mentioned about on this, like, flying vehicles around the world. They fly both in outer space and in airplanes. That was really fun. And then the large one is when you have workloads at, like, 14-and-a-half gigabytes per second, where the alternative of using something like Kinesis in the case of Lacework—which, you know, they wrote a new stack article about—would be so exorbitantly expensive. And so, in a way, I think that, you know, just trying to make the developers successful, really focusing, honestly, on the person who just has to make things work. We don't—by the time we get to the CIO, really the champion was the engineer who had to build an application. “I was just trying to figure it out the whack-a-mole of trying to debug alternative systems.”Corey: One of the, I think, seductive problems with your entire space is that no one decides day one that they're going to implement a data streaming solution for a very scaled-out, high-traffic site. The early adoption is always a small thing that you're in the process of building. And at that scale at that speed, it just doesn't feel like it's that hard of a problem because scale introduces its own unique series of challenges, but it's often one that people only really find out themselves when the simple thing that works in theory but not in production starts to cause problems internally. I used to work with someone who was a deeply passionate believer in Apache Kafka to a point where it almost became a problem, just because their answer to every problem—it almost didn't matter if it was, “How do we get more coffee this morning?”—Kafka would be the answer for all of it.And that's great, but it turned out, they became one of these people that borderline took on a product or a technology as their identity. So, anything that would potentially take a workload away from that, I got a lot of internal resistance. I'm wondering if you find that you're being brought in to replace existing systems or for completely greenfield stuff. And if the former, are you seeing a lot of internal resistance to people who have built a little niche for themselves?Alex: It's true, the people that have built a career, especially at large banks, were a pretty good fit for, you know, they actually get a team, they got a promotion cycle because they brought this technology and the technology sort of helped them make money. I personally tend to love to talk to these people. And there was a ca—to me, like, technically, let's talk about, like, deeply technical. Let me help you. That obviously doesn't scale because I can't have the same conversation with ten people.So, we do tend to see some of that. Actually, from our customers' standpoint, I would say that the large part of our customer base, you know, if I'm trying to put numbers, maybe 65%, I probably rip and replace of, you know, either upstream Apache Software or private companies or hosted services, et cetera. And so, I think you're right in saying, “Hey, that resistance,” they probably handled the [unintelligible 00:16:38], but what changed in the last year is that the CIO now stepped in and says, “I am going to fire all of you or you have to come up with a $10 million savings. Help me.” [laugh]. And so, you know, then really, my job is to help them look like a hero.It's like, “Hey, look, try it tested, benchmark it in your with your own workload, and if it saves you money, then use it.” That's been, you know, to sort of super helpful kind of on the macroeconomic environment. And then the last one is sometimes, you know, you do have to go with a greenfield, right? Like, someone has built a career, they want to gain confidence, they want to ask you questions, they want to trust you that you don't lose data, they want to make sure that you do say the things that you want to say. And so, sometimes it's about building trust and building that relationship.And developers are right. Like, there's a bunch of products out there. Like, why should I trust you? And so, a little easier time, probably now, that you know, with the CIOs wanting to cut costs, and now you have an excuse to go back to the executive team and say, “Look, I made you look smart. We get to [unintelligible 00:17:35], you know, our systems can scale to this.” That's easy. Or the second one is we do, you know, we'll start with some side use case or a greenfield. But both exists, and I would say 65% is probably rip-outs.Corey: One question, I love to, I wouldn't call it ambush, but definitely come up with, the catches some folks by surprise is one of the ways I like to sort out zealots from people who are focused on business problems. Do have an example of a data streaming workload for which Redpanda would not be a great fit?Alex: Yeah. Database-style queries are not a fit. And so, think that there was a streaming engine before there was trying to build a database on top of it, and, like—and probably it does work in some low volume of traffic, like, say 5, 10 megabytes per second, but when you get to actual large scale, it just it doesn't work. And it doesn't work because but what Redpanda is, it gives you two properties as a developer. You can add data to the end or you can truncate the head, right?And so, because those are your only two operations on the log, then you have to build this entire caching level to be able to give this database semantics. And so, do you know, I think for that the future isn't for us to build a database, just as an example, it's really to almost invert it. It's like, hey, what if we make our format an open format like Apache Iceberg and then bring in your favorite database? Like, bring in, you know, Snowflake or Athena or Trina or Spark or [unintelligible 00:18:54] or [unintelligible 00:18:55] or whatever the other [unintelligible 00:18:56] of great databases that are better than we are, and doing, you know, just MPP, right, like a massively parallelizable database, do that, and then the job for us, for [unintelligible 00:19:05], let me just structure your log in a way that allows you to query, right? And so, for us, when we announced the $100 million dollar Series C funding, it's like, I'm going to put the data in an iceberg format so you can go and query it with the other ten databases. And there are a better job than we are at that than we are.Corey: It's frankly, refreshing to see a vendor that knows where, okay, this is where we start and this is where we stop because it just seems that there's been an industry-wide push for a while now to oh, you built a component in a larger system that works super well. Now, expand to do everything else in the architectural diagram. And you suddenly have databases trying to be network transport layers and queues trying to be data warehouses, and it just doesn't work that way. It just it feels like oh, this is a terrible approach to solving this particular problem. And what's worse, from my mind, is that people who hadn't heard of you before look at you through this lens that does not put you in your best light, and, “Oh, this is a terrible database.” Well, it's not supposed to be one.Alex: [laugh].Corey: But it also—it puts them off as a result. Have you faced pressure to expand beyond your core competency from either investors or customers or analysts or, I don't know, the voices late at night that I hear and I assume everyone else does, too?Alex: Exactly. The 3 a.m. voice that I have to take my phone and take a voice note because it's like, I don't want to lose this idea. Totally. For us. I think there's pressures, like, hey, you built this great engine. Why don't you add, like, the latest, you know, soup de jour in systems was like a vector database.I was like, “This doesn't even make any sense.” For me, it's, I want to do one thing really well. And I generally call it internally, ‘the ring zero.' It's, if you think of the internet, right, like, as a computer, especially with this mode to what we talked about earlier in a BYOC, like, we could be the best ring zero, the best sort of like, you know, messaging platform for people to build real-time applications. And then that's the case and there's just so much low-hanging fruit for us.Like, the developer experience wasn't great for other systems, like, why don't we focus on the last mile, like, making that developer, you know, successful at doing this one thing as opposed to be an average and a bunch of other a hundred products? And until we feel, honestly, that we've done a phenomenal job at that—I think we still have some roadmap to get there—I don't want to expand. And, like, if there's pressure, my answer is, like… look, the market is big enough. We don't have to do it. We're still, you know, growing.I think it's obviously not trivial and I'm kind of trivializing a bunch of problems from a business perspective. I'm not trying to degrade anyone else. But for us, it's just being focused. This is what we do well. And bring every other technology that makes you successful. I don't really care. I just want to make this part well.Corey: I think that that is something that's under-appreciated. I feel like I should get over at one point to something that's been nagging at the back of my mind. Some would call it a personal attack and I suppose I'll let them, but what I find interesting is your background. Historically, you were a distributed systems engineer at very large scale. And you apparently wrote the first version of Redpanda yourself in—was it C or C++?Alex: C++.Corey: Yeah. And now you are the CEO of a company that is clearly doing very well. Have you gotten the hell out of production yet? The reason I ask this is I have worked in a number of companies where the founder was also the initial engineer and then they invariably treated main as their feature branch and the rest of us all had to work around them to keep them from, you know, destroying everything we were trying to build around us, due to missing context. In other words, how annoyed with you are your engineers on any given afternoon?Alex: [laugh]. Yeah. I would say that as a company builder now, if I may say that, is the team is probably the thing I'm the most proud of. They're just so talented, such good [unintelligible 00:22:47] of humans. And so—group of humans—I stopped coding about two years ago, roughly.So, the company is four-and-a-half years old, really the first two-and-a-half years old, the first one, two years, definitely, I was personally putting in, like, tons and tons of hours working on the code. It was a ton of fun. To me, one of the most rewarding technical projects I've ever had a chance to do. I still read pull requests, though, just so that when I have a conversation with a technical leader, I don't be, like, I have no clue how the transactions work. So, I still have to read the code, but I don't write any more code and my heart was a little broken when my dev prod team removed my write access to the GitHub repo.We got SOC2 compliance, and they're like, “You can't have access to being an admin on Google domains, and you're no longer able to write into main.” And so, I think as a—I don't know, maybe my identity—myself identity is that of a builder, and I think as long as I personally feel like I'm building, today, it's not code, but you know, is the company and [unintelligible 00:23:41] sort of culture, then I feel okay [laugh]. But yeah, I no longer write code. And the last story on that, is this—an engineer of ours, his name is [Stefan 00:23:51], he's like, “Hey, so Alex wrote this semaphore”—this was actually two days ago—and so they posted a video, and I commented, I was like, “Hey, this was the context of semaphore. I'm sorry for this bug I caused.” But yeah, at least I still remember some context for them.Corey: What's fun is watching things continue to outpace and outgrow you. I mean, one of the hard parts of building a company is the realization that every person you hire for a thing that's now getting off of your plate is better at that thing than you are. It's a constant experience of being humbled. And at some point, things wind up outpacing you to the point where, at least in my case, I've been on calls with customers and I explained how we did some things and how it worked and had to be corrected by my team of, “Well. That used to be true, however…” like, “Oh, dear Lord. I'm falling behind.” And that's always been a weird feeling for me.Alex: Totally. You know, it's the feeling of being—before I think I became a CEO, I was a highly comped  engineer and did a competent, to the extent that it allowed me to build this product. And then you start doing all of these things and you're incompetent, obviously, by definition because you haven't done those things and so there's like that discomfort [laugh]. But I have to get it done because no one else wants to do, whatever, like say, like, you know, rev ops or marketing or whatever.And then you find somebody who's great and you're like, oh my God, I was like, I was so poor tactically at doing this thing. And it's definitely humbling every day. And it's almost it's, like, gosh, you're just—this year was kind of this role where you're just, like, mediocre at, like, a whole lot of things as a company, but you're the only person that has to do the job because you have the context and you just have to go and do it. And so, it's definitely humbling. And in some ways, I'm learning, so for me today, it's still a lot of fun to learn.Corey: This is a little more in the weeds, I suppose, but I always love to ask people these questions. Because I used to be naive, which meant that I had hope and I saw a brighter future in technology. I now know that was all a lie. But I used to believe that out there was some company whose internal infrastructure for what they'd built was glorious and it would be amazing. And I knew I would never work there, nor what I want to, because when everything's running perfectly, all I can really do is mess that up; there's no way to win and a bunch of ways to lose.But I found that place doesn't exist. Every time I talk to someone about how they built the thing that they built and I ask them, “If you were starting over from scratch, what would you do differently?” The answer often distills down to, “Oh, everything.” Because it's an organically evolving system that oh, yeah, everything's easier the second time. At least you get to find new failure modes go in that way. When you look back at how you designed it originally, are there any missteps that you could have saved yourself a whole lot of grief by not making the first time?Alex: Gosh, so many things. But if I were to give Hollywood highlights on these things, something that [unintelligible 00:26:35] is, does well is exposing these high-level data types of, like, streams, and lists and maps and et cetera. And I was like, “Well, why couldn't streams offer this as a first-class citizen?” And we got some things well which I think would still do, like the whole [thread recorder 00:26:49] could—like, the fundamentals of the engine I will still do the same. But, you know, exposing new programming models earlier in the life of the product, I think would have allowed us to capture even more wildly different use cases.But now we kind of have this production engine, we have to support Fortune 2000, so you know, it's kind of like a very delicate evolution of the product. Definitely would have changed—I would have added, like, custom data types upfront, I would have pushed a little harder on I think WebAssembly than we did originally. Man, I could just go on for—like, [added detail 00:27:21], I would definitely have changed things. Like, I would have pressed on the first—on the version of the cloud that we talked about early on, that as the first deployment mode. If we go back through the stack of all of the products you had, it's funny, like, 11 products that are surfaced to the customers to, like, business lines, I would change fundamental things about just [laugh], you know, everything else. I think that's maybe the curse of the expert. Like, you know, you could always find improvements.Corey: Oh, always. I still look back at my career before starting this place when I was working in a bunch of finance companies, and—I'll never forget this; it was over a decade ago—we were building out our architecture in AWS, and doing a deal with a large finance company. And they said, “Cool, where's your data center?” And I said, “Oh, it's AWS.” And they said, “Ha ha ha ha. Where's your data center?”And that was oh, okay, great. Now, it feels like if that's their reaction, they have not kept pace with the times. It feels it is easier to go to a lot of very serious enterprises with very serious businesses and serious workload concerns attendant to those and not get laughed out of the room because you didn't wind up doing a multi-million dollar data center build out that, with an eye toward making it look as enterprise-y as possible.Alex: Yeah. Okay, so here's, I think, maybe something a little bit controversial. I think that's true. People are moving to the cloud, and I don't think that that idea, especially when we go when we talk to banks, is true. They're like, “Hey, I have this contract with one of the hybrid clouds.”—you know, it's usually with two of them, and then you're like—“This is my workload. I want to spend $70 million or $100 million. Who could give me the biggest discount?” And then you kind of shop it around.But what we are seeing is that effectively, the data transfer costs are so expensive and running this for so much this large volume of traffic is still so, so expensive, that there is an inverse [unintelligible 00:29:09] to host from some category of the workload where you don't have dynamism. Actually hosted in your data center is, like, a huge boom in terms of cost efficiencies for the companies, especially where we are and especially in finances—you mentioned that—if you're trying to trade and you have this, like, steady state line from nine to five, whatever, eight to four, whenever the markets open, it's actually relatively cost-efficient because you can measure hey, look, you know, the New York Stock Exchange is 1.5 gigabytes per second at market close. Like, I could provision my hardware to beat this. And like, it'll be that I don't need this dynamism that the cloud gives me.And so yeah, it's kind of fascinating that for us because we offered the self-hosted Redpanda which can adapt to super low latencies with kernel parameter tuning, and the cloud due to the tiered storage, we talked about S3 being [unintelligible 00:29:52] to, so it's been really fun to participate in deployments where we have both. And you couldn't—they couldn't look more different. I mean, it's almost looks like two companies.Corey: One last question before we wind up calling it an episode. I think I saw something fly by on Twitter a while back as I slowly returned to the platform—no, I'm not calling it X—something you're doing involving a scholarship. Can you tell me a bit more about that?Alex: Yeah. So, you know, I'm a Latino CEO, first generation in the States, and some of the things that I felt really frustrated with, growing up that, like, I feel fortunate because I got to [unintelligible 00:30:25] that is that, you know, people were just—that look like me are probably given some bullshit QA jobs, so like, you know, behemoth job, I think, for a bank. And so, I wanted to change that. And so, we give money and mentorship to people and we release all of the intellectual property. And so, we mentor someone—actually, anyone from underrepresented backgrounds—for three months.We give then, like, 1200 bucks a month—or 1500, I can't remember—mentorship from our top principal level engineers that have worked at Amazon and Google and Facebook and basically the world's top companies. And so, they meet with them one hour a week, we give them money, they could sit in the couch if they want to. No one has to [unintelligible 00:31:06]. And all we're trying to do is, like, “Hey, if you are part of this group, go and try to build something super hard.” [laugh].And often their minds, which is great, and they're like, “I want to build an OpenAI competitor in three months, and here's the week-by-week progress.” Or, “I want to build a new storage engine, new database in three months.” And that's the kind of people that we want to help, these like, super ambitious, that just hasn't had a chance to be mentored by some of the world's best engineers. And I just want to help them. Like, we—this is a non-scalable project. I meet with them once a week. I don't want to have a team of, like, ten people.Like, to me, I feel like their most valuable thing I could do is to give them my time and to help them mentor. I was like, “Hey, let's think about this problem. Let's decompose this. How do you think about this?” And then bring you the best engineers that I, you know, that work for—with me, and let me help you think about problems differently and give you some money.And we just don't care how you use the time or the money; we just want people to work on hard problems. So, it's active. It runs once a year, and if anyone is listening to this, if you want to send it to your friends, we'd love to have that application. It's for anyone in the world, too, as long as we can send the person a check [laugh]. You know, my head of finance is not going to walk to a Moneygram—which we have done in the past—but other than that, as long as you have a bank account that we can send the check to, you should be able to apply.Corey: That is a compelling offer, particularly in the current macro environment that we find ourselves faced in. We'll definitely put a link to that into the [show notes 00:32:32]. I really want to thank you for taking the time to, I guess, get me up to speed on what it is you're doing. If people want to learn more where's the best place for them to go?Alex: On Twitter, my handle is @emaxerrno, which stands for the largest error in the kernel. I felt like that was apt for my handle. So, that's one. Feel free to find me on the community Slack. There's a Slack button on the website redpanda.com on the top right. I'm always there if you want to DM me. Feel free to stop by. And yeah, thanks for having me. This was a lot of fun.Corey: Likewise. I look forward to the next time. Alex Gallego, CEO and founder at Redpanda. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment that I will almost certainly never read because they have not figured out how to get data from one place to another.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

Streaming Audio: a Confluent podcast about Apache Kafka
What Could Go Wrong with a Kafka JDBC Connector?

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 4, 2022 41:10 Transcription Available


Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations. In this episode, we'll cover how a JDBC connection works, and common issues with your database connection. Why the Kafka JDBC Connector? When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture. The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you'd like to execute against the database, and then the connector will take the data into Kafka. The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don't represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector's supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as: Infrequent snapshot timesOut-of-order eventsNon-incremental sequencesHard deletesTo help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs,  and provides sequence numbers for you to speak the language of a DBA. They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate. EPISODE LINKSKafka Connect Deep Dive – JDBC Source ConnectorJDBC Source Connector: What could go wrong?Metadata parser Debezium DocumentationDatabase Migration with Apache Kafka and Apache Kafka ConnectWatch the video version of this podcastFrancesco Tisiot's TwitterKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent Developer

Python Podcast
PostgreSQL und MariaDB

Python Podcast

Play Episode Listen Later Jun 14, 2022 163:49


Vor über drei Jahren hatten wir ja schon einmal eine Episode über Datenbanken. Da das ja nun schon ein bisschen her ist, dachten wir dass es vielleicht an der Zeit wäre, mal wieder über dieses Thema zu reden. Dazu haben wir (Dominik und Jochen) uns diesmal mit Susanne zusammengesetzt, die seit vielen Jahren Consulting und Schulungen zum Thema anbietet. Die alte Datenbank-Episode war unsere längste Episode bisher, und irgendwie ist auch diese hier länger als gewöhnlich geworden. Offenbar gibt es über Datenbanken mehr zu sagen als zu anderen Themen

Data on Kubernetes Community
What we've learned from running a PostgreSQL managed service on Kubernetes (DoK Day EU 2022) // Oleksii Kliukin

Data on Kubernetes Community

Play Episode Listen Later May 28, 2022 11:06


https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) Kubernetes is an emerging platform of choice for deploying and running PostgresSQL. Deploying 100 Postgres clusters is as easy as deploying one, and there is no need to tinker with tools like Ansible or Puppet. Resource sharing can be applied when it makes sense, allowing to run multiple Postgres databases in isolation on a single instance, each storing the data on a dedicated persistent volume. There are great open-source tools out there to deal with high-availability and backups than support or can be easily integrated into the Kubernetes workflow. Monitoring and alerting is easy to implement. People reported success in running Postgres on Kubernetes before. But there are also rough edges, like memory management or certain Postgres maintenance operations, such as installing extensions, that normally cause unnecessary database downtimes on Kubernetes. They are less of a problem for in-house deployments, but may become a deciding factor when running a managed service, competing with other such services running on bare-metal servers or virtual machines that are free of those issues. In this talk, I will share some of our learnings from running a managed PostgreSQL/TimescaleDB service on Kubernetes on AWS for a little more than a year: I'll start with the motivation of running managed PostgreSQL on Kubernetes, the benefits and drawbacks. I'll describe the architecture of the managed PostgreSQL cloud on Kubernetes I'll zoom in on how we solved some of the Kubernetes-specific issues within our cloud, such as upgrading extensions without downtimes, taming the dreaded OOM killer, and doing regular maintenance and PostgreSQL major upgrades. I'll share how open-source tools from the PostgreSQL ecosystem helps us to run the service and explain how we use them in a slightly non-trivial way. Oleksii has been working with PostgresSQL for almost 20 years, and has been deploying Postgres on Kubernetes since 2016, when his team at Zalando started the internal managed PostgreSQL service based on the in-house and open-source postgresql-operator. Around 2015, with some other team members, he stared working on a PostgreSQL HA project that later became Patroni. Long before that he was hacking PosgreSQL source code to implement binary replication on PostgreSQL 7.x, authoring some PostgreSQL extensions and contributing to the core PostgreSQL itself. He started PosgreSQL meetups in Berlin in 2015 and hopes to get back to meeting in-person somewhere in 2022. Being Ukrainian, he lives in Berlin for a bit more than 9 years with his wife, two children and numerous plants

Der Data Analytics Podcast
MySQL, PostgresSQL etc...Funktionen eines DBMS - Datenbankmanagementsystem

Der Data Analytics Podcast

Play Episode Listen Later Dec 12, 2021 10:58


Datenbankmanagementsysteme wie MySQL, PostgresSQL, Oracle etc haben diverse Funktionen, die ich heute einmal beschreibe.

The Swyx Mixtape
Journey to TimeScaleDB [Mike Freedman]

The Swyx Mixtape

Play Episode Listen Later Sep 1, 2021 12:19


Listen to the full interview on SEDaily: https://softwareengineeringdaily.com/2021/06/28/timescale-time-series-databases-with-mike-freedman/We originally created Timescale, really from our own need. Around thattime, 2014-2015, my co-founder and I, Ajay Kulkarni, who we go back many years, we resyncedup and we started thinking about it was a good time for both of us to think about what the nextchallenges are that we want to tackle. It seemed to us that there was this emerging trend ofnow, people talk about the digitization, or digital transformation. It feels like somewhat of ananalyst term, but I think, it's really responsive of what's happening, in that if you think about thelarge, big IT revolution, it was about changing the back office. What was used to be on paperwas now in computers.What we saw was somewhat the same thing happened to basically, every industry, from heavyindustry, to shipping, to logistics, to manufacturing, both discrete and continuous and home IoT.Sometimes this gets blurred under IoT, but we also think about it more broadly as operationaltechnology, those which are not necessarily bits, but atoms. A big part of that was actuallycollecting data of what those systems were doing. It's about sensors and data and whatnot.When we do Initially looked at this problem, we were thinking about a type of data platform wewould want to build, to make it easy to collect and store and analyze that type of data. I thinkthat's a way that we're slightly different, or why our – what we ultimately built as our databaseended up being fairly different than a lot of other so-called time series databases. That'sbecause many of them arose out of IT monitoring, where they were trying to collect metrics fromservers, where we were originally thinking about collecting data more broadly from all these typeof applications and devices around your world.When we started building it, it was originally focusing mostly on IoT. We quickly ran into thisproblem that the existing databases out there and the time series databases out there were notreally designed for our problems. They were often much more limited, because they werefocusing on this narrow infrastructure monitoring problem, where the data maybe wasn't asimportant. It was only a very specific type. Let's say, they stored only floats. They didn't have tohave extra metadata that they wanted to enrich their data to better understand what was goingon, like through joins.After, basically working on this platform for about a year, we somewhat came to the conclusionthat we actually need to build somewhat of our own time series database that was focusing onthis more broad type of problem, and so that's what we do. That's what led the development ofwhat became Timescale.JM: Today, what are the most common applications of a time series database?Like and speak mostly about obviously, TimescaleDB, rather than – as I wasalluding to before, a lot of the other time series databases are much more narrowly focused onIT monitoring, or observability. We really see our use cases across the field. We certainly seecases of observability. In fact, we have subsequently built actually a separate product on top of Timescale called Promp scale, that is really used for initially Prometheus metrics, but morebroadly, to make it easier to store observability data with TimescaleDB.We see still a lot of IoT. We see a lot of logistics. We see financial data and crypto data. We seeevent sourcing. We see product and user analytics. We see people collecting data about howusers are using their SaaS platforms. We see gaming analytics, where companies are collectinginformation about how people's virtual avatars are actually playing within the games. We seemusic analytics. We like to think of the old way, used to find the pop stars, you went down to thesmoky club. Now you collect SoundCloud and Spotify streams, and you use that to identify whothe next breakout artist is going to be.All of these are example of time series data. It's really what's so exciting to us as is it's such abroad use case, so horizontal, because basically, it's all about collecting data at the finestgranularity you can.Tell me about the initial architecture for TimescaleDB. You're based off ofPostgresSQL. What was the reasoning around that decision?I think, as you point out, Timescale is actually implemented as an extension onPostgresSQL. Starting maybe 10 or 15 years ago, PostgresSQL started exposing low-levelhooks throughout its code base. This is not a plugin where you're running a little JavaScriptcode. We have function pointers into – we get function hooks into the C. PostgresSQL is writtenin C, and so TimescaleDB is, for the most part written in C. We have hooks throughout the codebase at the planner, at sometimes in the storage, at the execution nodes. We are able to insertourselves and do Lot of optimizations as part of the same process.You could ask the question of why not just implement a new database from scratch? Why buildit on top of PostgresSQL? I think this really gets to that, we always viewed ourselves as, and wehear this from our users and community all the time that we are – they are storing critical datainside TimescaleDB, and they need it to, A, work and be reliable. They also need it to be – theyhave a lot of use case requirements. It's not this, again, narrow thing where you're collectingone metrics, and all you're asking to do is figure out the min-max average of a certain metric.You want to do fancy analysis. You want to do joins. You want to do sub queries. You want to docorrelations. You want to have views. You want the operational maturity of a database. You wanttransactions, backup, and restore, and all of the replication and all of the above. Some peoplesay, it takes maybe 10 years, at least, to build a reliable database. We thought this was a greatway in order to immediately gain that level of reliability, we ourselves are huge fans ofPostgresSQL. It has such a great community. It also has such a large ecosystem.The idea is that effectively, that entire ecosystem would work from us on day one. That means,all of the tooling, all of the ORMs, all of your libraries would just work. If we support full SQL, notSQL-ish. If you know how to use SQL, you could start using – and if your tools speak SQL, ifyou're running Tableau, if you're running Power BI if you're running Grafana, if you're runningSuperset, those all just start working on day one.Now, the second part of it is, well, what does that mean to build a time series database on top ofPostgresSQL, which clearly was designed more as a traditional transactional database, OLTPengine? Sometimes they talk about you think about this architecturally. What I mean by that isyou somewhat think about what your workloads look like and what that would mean from asoftware architecture. Maybe I'll give you a very concrete example. Starting maybe 10 or 15years ago, if you look at traditional databases, you started seeing the growth of what peoplecommonly now called as log structured merge trees, LSMs.This is a data structure that goes back to the mid-90s, but I think you first saw Google, JeffDean and Sanjay Ghemawat built something called LevelDB. The whole idea of an LSM treewas, if you look at a workload that has a lot of updates, so with a lot of e-commerceapplications, with a lot of social networks, you're constantly updating things. Traditionaldatabase, if you think about a disk, if you're doing a lot of in-place updates, and these updatesare randomly distributed across all of your user IDs, this means that you're going to cause yourdisk to do a lot of random writes on hard drives, that's particularly bad. You need to move thedisk.Even on SSDs, it doesn't do great, because SSDs still do a lot better to have sequential writesthan random writes, the way the internals of SSDs work. You started seeing this new type ofdatabase architecture called LSM trees emerge, because people wanted to build databases thathad a lot faster updates. On time series databases, on the other hand, don't typically have thistype of workloads.If you think of a stream of new observations, with the timestamp, these are typically aboutwhat's happening now. It's typically about a stream of inserts that are about this stock price now,this stock price now, this stock price now, or different, or a 100, or a 1,000, or a 100,000 differentsensors all about what the recording right now.If you think about how you would then design the internals of your database and the datastructures, when most of your rights are insert heavy, and particularly about the latest timeinterval, then what that would mean is the somewhat internal structure of your data shouldreflect that. You should optimize your insert path to make it super-efficient to perform inserts onthe latest time interval. It doesn't have to be perfectly in order, but it mostly is about what'shappening recently, as opposed to what's happening a year ago.That said, Timescale absolutely allows you to backfill data and perform updates, or deletes toolder data. It's just from a performance perspective, keeps all the recent stuff in memory andbuilds more efficient data structure to allow you to insert at much higher rates. For example, ona single machine, if you're collecting a stream of records, eat for several, let's say 10 metrics,you'll be able to collect even once 2 million metrics per second on a single, pretty standardmachine. Then we see this again and again, the way we think about architecting Timescale is, is reallythinking about what the workload looks like, that people often care about recent data. The waythey want to manage their data changes as that data ages. They might want to optimize foreven fast queries for the recent stuff. They might want to start reorganizing their data as ages. They might want to start automated automatically aggregating the data as it ages, and droppingthe raw data for the very old stuff to save space. All of these things are what you'd want in agood time series database, when it's not what you want from either a traditional OLTP database,nor if you have a traditional data warehouse, or an analytical database, which doesn't think ofthis operational view of time series so central to it.

BadGeek
Les Cast Codeurs n°240 du 17/10/20 - LCC 240 - Fuseaux horaires : la fontaine à bogues (87min)

BadGeek

Play Episode Listen Later Oct 17, 2020 87:54


Cet épisode consacre Java le langage et sa version 15 en particulier. On discute aussi l'impact des mémoires fautives sur la JVM, le réactif, les frameworks backends et bien d'autres choses. Et nous avons deux crowdcasts! ???? Enregistré le 13 octobre 2020 Téléchargement de l'épisode [LesCastCodeurs-Episode-240.mp3](https://traffic.libsyn.com/lescastcodeurs/LesCastCodeurs-Episode-240.mp3) ## News ### Langages [Java 15](https://twitter.com/java/status/1305874025231650817) (plus de détails par [Remi Forax](https://groups.google.com/d/msgid/lescastcodeurs/63086933.1723204.1600182952350.JavaMail.zimbra%40u-pem.fr) ). * unicode 13, * hidden classes (non-discoverable implementation details of the framework eg classes générées au runtime, déchargement aggressif), * TreeMap amelioration de perf, * check de revocation pour le jar signer, * sha-3 support, * nashorn enlevé, * les lock biaisés sont désactivés/dépréciés, * ZGC prêt pour la prod, * amélioration d’ergo de G1 (on en avait parlé), [Azul couvre JDK 15](https://www.azul.com/jdk-15-release-64-new-features-and-apis/), avec un focus sur les Sealed classes, les Records, les Hidden classes. [Des bugs de timezone qui vous pouvez rencontrer dans vos applications](https://blog.davidojeda.dev/4-time-zone-bugs-i-ran-into). * Faire tourner son code sous une vieille version de JRE/JDK qui n’est pas à jour au niveau des données de TimeZone. * Le serveur est peut-être dans votre TimeZone à vous, ou bien sur UTC, alors attention ! * En stockant des dates en bases de données, il faut aussi prendre en compte la TimeZone configurée dans la DB ! * Parfois aussi côté affichage, on peut oublier de parser les dates avec les infos de TimeZone, ou côté client la TimeZone ou l’heure d’été peut être mal configurée... Enregistrements d’Oracle Developer Live Java * * [Les crashs de JVM sont souvent des erreurs mémoire.](https://shipilev.net/jvm/test-your-memory/) Pourquoi ? * La JVM on lui donne souvent beaucoup de mémoire. * Les métadonnées utilisées par le GC sont importantes. * Ces métadonnées sont accédées entièrement lors d’un full GC. Romain Manni-Bucau explique [comment configurer Java Util Logging avec un formateur sur une ligne](https://rmannibucau.metawerx.net/post/jul-production-ready-pattern-with-simpleformatter), pour que ce soit plus sympa dans un contexte prod Docker [Un rapport sur l’écosystème Groovy](https://e.printstacktrace.blog/groovy-ecosystem-usage-report-2020/) : * l’outil le plus utilisé : Gradle (pour le build). * librairie la plus populaire : Spock (pour les tests). * framework le plus populaire : Grails. * beaucoup d’utilisation de Groovy comme langage de script, d’automatisation, mais aussi comme langage général principal et comme Domain-Specific Language. ### Librairies [Sortie de JUnit 5.7](https://twitter.com/junitteam/status/1305140909223411712) * Isolated tests. * New Enabled/DisabledIf execution conditions. * Custom disabled reasons. * New MethodOrderer.DisplayName. * New DisplayNameGenerator.Simple. * Java Flight Recorder support. * Improved EngineTestKit. [Quarkus vs Spring Boot dans des environnements de plus en plus contraints mais quel est le choix d’équipe au final ?](https://medium.com/swlh/hail-to-the-new-king-or-not-295090a96bbf). [Sortie de Micronaut 2.1](https://micronaut.io/blog/2020-10-05-micronaut-21-released.html). * nouveau plugin Gradle qui facilite le build d’image natives Graal et de containers Docker en couches. * support des fonctions Oracle Cloud. * support amélioré de Google Cloud Platform, avec le logging structuré de Stackdriver, * et le support natif du messaging avec Google Cloud Pub/Sub. * Plus d’infos dans la doc dans la section des nouveautés : https://docs.micronaut.io/2.1.0/guide/index.html#whatsNew Sortie de [Quarkus 1.8](https://quarkus.io/blog/quarkus-1-8-0-final-released/) (et on n'avait pas annoncé 1.7) * multiple persistence unit * Micrometer * intégration avec jbang * GraalVM 20.2 * MongoDB pour Kotlin * Elasticsearch REST client (1.7) * Client vert.x Redis (1.7) * Hibernate Envers (1.7) * DB2 (1.7) ### Infrastructure [NVidia rachète ARM à SoftBank pou 40Md$](https://www.engadget.com/nvidia-arm-acquisition-softbank-000846113.html), ça va faire un sacré concurrent à Intel et AMD. () [Retour d’experience sur l’utilisation de Kubernetes.](https://medium.com/better-programming/3-years-of-kubernetes-in-production-heres-what-we-learned-44e77e1749c8) Java et ses problèmes (spécialement 8, toujours grosse conso mémoire), mettre à jour Kubernetes (ils créent des nouveaux clusters), corriger un index au démarrage vs liveness probe, exposer des IPs externes et la limite de connections parallèles. [Project Natick: Datacenter mis sous l’eau en 2018 par Microsoft](https://www.theverge.com/2020/9/14/21436746/microsoft-project-natick-data-center-server-underwater-cooling-reliability). * 864 servers, 27.6 petabytes de stockage, 117 pieds au fond de l’ocean (Ecosse). * Aux dires de Microsoft, c’est un succès. [Google est neutre en carbone, et a même racheté sa dette carbone depuis sa création](https://blog.google/outreach-initiatives/sustainability/our-third-decade-climate-action-realizing-carbon-free-future/). * (1) We were the first major company to become carbon neutral in 2007. → Google a été “neutre” par compensation, chaque année depuis 2007, notamment en achetant autant d’énergie renouvelable que d’énergie carbonée. * (2) We were the first major company to match our energy use with 100 percent renewable energy in 2017... We’re eliminating our entire carbon legacy, effective today. → Depuis 2017, Google a en plus acheté autant d’énergie renouvelable l’année N que d’énergie carbonée consommée l’année N+1. En Septembre 2020, Google a fini fini par être totalement “neutre” en ayant consommé autant d’énergie renouvelable que d’énergie carbonée depuis la création de Google. * (3) We are the first major company to make a commitment to operate on 24/7 carbon-free energy in all our data centers and campuses worldwide... by 2030. → Dans 10 ans, Google espère ne plus consommer d’énergie carbonée du tout. ### Data [CrunchyDB offre un PostgresSQL as a service qui est cross clouds](https://info.crunchydata.com/blog/announcing-crunchy-bridge-a-modern-postgres-as-a-service). ### Outillage [Github sort la version 1.0 de son outil en ligne de commande pour gérer ses projets Github](https://github.blog/2020-09-17-github-cli-1-0-is-now-available/). ### Architecture Un vieux truc, [le memo de Bezos sur la service oriented company](https://gist.github.com/chitchcock/1281611). [Jonas Boner annonce les 8 principes réactifs](https://principles.reactive.foundation/). * I. Stay Responsive : Always respond in a timely manner. * II. Accept Uncertainty : Build reliability despite unreliable foundations. * III. Embrace Failure : Expect things to go wrong and design for resilience. * IV. Assert Autonomy : Design components that act independently and interact collaboratively. * V. Tailor Consistency : Individualize consistency per component to balance availability and performance. * VI. Decouple Time : Process asynchronously to avoid coordination and waiting. * VII. Decouple Space : Create flexibility by embracing the network. * VIII. Handle Dynamics : Continuously adapt to varying demand and resources. ### Méthodologies [Les recommendations de Red Hat aux Red Hatters sur la contribution à l'Open Source](https://www.redhat.com/en/about/open-source/participation-guidelines) ### Sécurité [Jenkins vient avec pleins de fix de sécurité](https://groups.google.com/g/jenkinsci-advisories) comme tous les mois (voire 2 fois par mois). ### Loi, société et organisation [Est-ce que le Hacktoberfest de Digital Ocean fait mal à l’Open Source ?](https://blog.domenic.me/hacktoberfest/) * plein de gens contribuent des commits à deux balles, juste pour gagner un t-shirt, et c’est les mainteneurs de projets open source qui sont obligés de se taper tous les pull requests comme des messages de spam * [Update de DigitalOcean to reduce spam](https://hacktoberfest.digitalocean.com/hacktoberfest-update) * [Comment une personne (un YouTuber avec 600K followers a pourri le système)](https://joel.net/how-one-guy-ruined-hacktoberfest2020-drama) [Bye bye Stop Covid qui va devenir Alerte Covid.](https://www.europe1.fr/societe/information-europe-1-la-nouvelle-appli-stop-covid-sappellera-alerte-covid-3997914) L’application ne règle en rien les problèmes en terme d’efficacité et de vie privée déjà décriés dans le passé mais veut rajouter des usages en ciblant notamment l’utilisation dans les bars et restaurants et en y diminuant la durée d’exposition utilisée comme indicateur de contact. Elle devrait aussi pouvoir vous notifier d’alerte locale (le gouvernement dans votre poche). ## Outils de l'épisode Un écran 49" 32:9 ## Rubrique débutant Si vous débutez en Docker, il est important de [comprendre les différences entre les instructions RUN, CMD, et ENTRYPOINT de vos Dockerfiles](https://www.baeldung.com/devops/dockerfile-run-cmd-entrypoint). * RUN est exécuté quand on build l’image. * CMD est l’instruction par défaut lancée au démarrage de votre image. * ENTRYPOINT permet plus de flexibilité que CMD en supportant les paramètres en entrée. ## Conférences [Codeurs En Seine 2020 - Edition en ligne](https://twitter.com/codeursenseine/status/1301064575786405888?s=21) * En novembre, les mardis à 19h et les jeudis à 21h * 45 minutes de conférences + environ 15 minutes de questions * En ligne sur Twitch + rediffusion Youtube Crowdcast de Emmanuel Demey sur les conférences à venir dans le Nord. * Cloud Nord le 19/10 en remote : * Web Stories le 5/2 en présentiel (pour le moment) * Le Devfest Lille le 11/6 en présentiel ## Nous contacter Soutenez Les Cast Codeurs sur Patreon [Faire un crowdcast ou une crowdquestion](https://lescastcodeurs.com/crowdcasting/) Contactez-nous via twitter sur le groupe Google ou sur le site web

IGeometry
Episode 127 - PostgreSQL 12 has some interesting new features, Is it worth the upgrade?

IGeometry

Play Episode Listen Later Feb 8, 2020 16:02


PostgresSQL version 12 has been released, let's go through the features that I think are most interesting and cool. #softwarenews   Feature Matrix https://www.postgresql.org/about/featurematrix/   - Allow adding columns to Index (GIST) https://www.postgresql.org/about/featurematrix/detail/314/  - COPY FROM WHERE COPY FROM ... WHERE  - More native support of JSON objects https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH  - Reindex concurrently (slow but allows writes) https://www.postgresql.org/docs/12/sql-reindex.html#SQL-REINDEX-CONCURRENTLY - Performance on large partitioned tables - Stored Generated Columns --- Send in a voice message: https://anchor.fm/hnasr/message

The Podlets - A Cloud Native Podcast
[BONUS] A conversation with Joe Beda (Ep 6)

The Podlets - A Cloud Native Podcast

Play Episode Listen Later Nov 22, 2019 47:21


For this special episode, we are joined by Joe Beda who is currently Principal Engineer at VMware. He is also one of the founders of Kubernetes from his days at Google! We use this open table discussion to look at a bunch of exciting topics from Joe's past, present, and future. He shares some of the invaluable lessons he has learned and offers some great tips and concepts from his vast experience building platforms over the years. We also talk about personal things like stress management, avoiding burnout and what is keeping him up at night with excitement and confusion! Large portions of the show are obviously spent discussion different aspects and questions about Kubernetes, including its relationship with etcd and Docker, its reputation as a very complex platform and Joe's thoughts for investing in the space. Joe opens up on some interesting new developments in the tech world and his wide-ranging knowledge is so insightful and measured, you are not going to want to miss this! Join us today, for this great episode! Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Special guest: Joe Beda Hosts: Carlisia Campos Bryan Liles Michael Gasch Key Points From This Episode: A quick history of Joe and his work at Google on Kubernetes. The one thing that Joe thinks sometimes gets lost in translation on these topics. Lessons that Joe has learned in the different companies where he has worked. How Joe manages mental stress and maintains enough energy for all his commitments. Reflections on Kubernetes relationship with and usage of etcd. Is Kubernetes supposed to be complex? Why are people so divided about it? Joe's experience as a platform builder and the most important lessons he has learned. Thoughts for venture capitalists looking to invest in the Kubernetes space. Joe's thoughts on a few different recent developments in the tech world. The relationship and between Kubernetes and Docker and possible ramifications of this. The tech that is most exciting and alien to Joe at the moment! Quotes: “These things are all interrelated. At a certain point, the technology and the business and career and work-life – all those things really impact each other.” — @jbeda [0:03:41] “I think one of the things that I enjoy is actually to be able to look at things from all those various different angles and try and find a good path forward.” — @jbeda [0:04:19] “It turns out that as you bounced around the industry a little bit, there's actually probably more alike than there is different.” — @jbeda [0:06:16] “What are the things that people can do now that they couldn't do pre-Kubernetes? Those are the things where we're going to see the explosion of growth.” — @jbeda [0:32:40] “You can have the most beautiful technology, if you can't tell the human story about it, about what it does for folks, then nobody will care.” — @jbeda [0:33:27] Links Mentioned in Today’s Episode: The Podlets on Twitter — https://twitter.com/thepodlets Kubernetes — https://kubernetes.io/Joe Beda — https://www.linkedin.com/in/jbedaEighty Percent — https://www.eightypercent.net/Heptio — https://heptio.cloud.vmware.com/Craig McLuckie — https://techcrunch.com/2019/09/11/kubernetes-co-founder-craig-mcluckie-is-as-tired-of-talking-about-kubernetes-as-you-are/Brendan Burns — https://thenewstack.io/kubernetes-co-creator-brendan-burns-on-what-comes-next/Microsoft — https://www.microsoft.comKubeCon — https://events19.linuxfoundation.org/events/kubecon-cloudnativecon-europe-2019/re:Invent — https://reinvent.awsevents.com/etcd — https://etcd.io/CosmosDB — https://docs.microsoft.com/en-us/azure/cosmos-db/introductionRancher — https://rancher.com/PostgresSQL — https://www.postgresql.org/Linux — https://www.linux.org/Babel — https://babeljs.io/React — https://reactjs.org/Hacker News — https://news.ycombinator.com/BigTable — https://cloud.google.com/bigtable/Cassandra — http://cassandra.apache.org/MapReduce — https://www.ibm.com/analytics/hadoop/mapreduceHadoop — https://hadoop.apache.org/Borg — https://kubernetes.io/blog/2015/04/borg-predecessor-to-kubernetes/Tesla — https://www.tesla.com/Thomas Edison — https://www.biography.com/inventor/thomas-edisonNetscape — https://isp.netscape.com/Internet Explorer — https://internet-explorer-9-vista-32.en.softonic.com/Microsoft Office — https://www.office.comVB — https://docs.microsoft.com/en-us/visualstudio/get-started/visual-basic/tutorial-console?view=vs-2019Docker — https://www.docker.com/Uber — https://www.uber.comLyft — https://www.lyft.com/Airbnb — https://www.airbnb.com/Chromebook — https://www.google.com/chromebook/Harbour — https://harbour.github.io/Demoscene — https://www.vice.com/en_us/article/j5wgp7/who-killed-the-american-demoscene-synchrony-demoparty Transcript: BONUS EPISODE 001 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [EPISODE] [0:00:41.9] CC: Hi, everybody. Welcome back to The Podlets. We have a new name. This is our first episode with a new name. Don’t want to go much into it, other than we had to change from The Kubelets to The Podlets, because the Kubelets conflicts with an existing project and we’ve thought it was just better to change. The show, the concept, the host, everything stays the same. I am super excited today, because we have a special guest, Joe Beda and Bryan Liles, Michael Gasch. Joe, just give us a brief introduction. The other hosts have been on the show before. People should know about them. Everybody should know about you too, but there's always newcomers in the space, so give us a little bit of a background. [0:01:29.4] JB: Yeah, sure. I'm Joe Beda. I was one of the founders of Kubernetes back when I was at Google, along with Craig McLuckie and Brendan Burns, with a bunch of other folks joining on soon after. I'm currently Principal Engineer at VMware, helping to cover all things Kubernetes and Tanzu related across the company. I came into VMware via the acquisition of Heptio, where Bryan's wearing the shirt today. Left Google, did that with Craig for about two years. Then it's almost a full year here at VMware. We're at 11 months officially as of two days ago. Yeah, really excited to be here. [0:02:12.0] CC: Yeah, I am so excited. Your name is Joe Beda. I always say Joe Beda. [0:02:16.8] JB: You know what? It's four letters and it's easy – it's amazing how many different ways there are to pronounce it. I don't get picky about it. [0:02:23.4] CC: Okay, cool. Well, today I learned. I am very excited about this show, because basically, I get to ask you anything I want. [0:02:35.9] JB: I’ll do my best to answer. [0:02:37.9] CC: Yeah. You can always not answer. There are so many interviews of you out there on YouTube, podcasts. We are going to try to do something different. Let me fire the first question I have for you. When people interview you, they ask you yeah, the usual questions, the questions that are very useful for the community. I want to ask you is this, what are people asking you that you think are the wrong questions? [0:03:08.5] JB: I don't think there's any bad questions like this. I think that there's a ton of interest that's when we're talking about technical stuff at different parts of the Kubernetes stack, I think that there's a lot of business context around the container ecosystem and the companies and around to forming Heptio, all that. A lot of times, I'll have discussions around career and what led me to where I'm at now. I think those are all a lot of really interesting things to talk about all around all that. The one thing that I think is doesn't always come across is these things are all interrelated. At a certain point, the technology and the business and career and work-life – all those things really impact each other. I think it's a mistake to try and take these things in isolation. There's a ton of lead over. I think one of the things that we tried to do at Heptio, and I think we did a good job is recognized that for anybody senior enough inside of any organization, they really have to be able to play all roles, right? At a certain point, everybody is as a business person, fundamentally, in terms of actually moving the ball forward for the company, for the business as a whole. Yeah. I think one of the things that I enjoy is actually to be able to look at things from all those various different angles and try and find a good path forward. [0:04:28.7] BL: All right. Taking that, so you've gone from big co to big co, to VC to small co to big co. What does that unique experience taught you and what can you share with us? [0:04:45.5] JB: Bryan, you know my resume better than I do apparently. I started my career at Microsoft and cut my teeth working on Internet Explorer and doing client side stuff there. I then went to Google in the office up here in Seattle. It was actually in Kirkland, this little hole-in-the-wall, temporary office, preemie work type of thing. I’m thinking, “Hey, I want to do some server-side stuff.” Worked on Google Talk, worked on ads, worked on cloud, started Kubernetes, was a little burned out. Took some time off, goofed off. Did this entrepreneur-in-residence thing for VC and then started Heptio and then sold the VMware. [0:05:23.7] BL: When you're in a big company, especially when you're more junior, it's easy to get caught up in playing the game inside of that company. When I say the game, what I mean is that there are measures of success within big companies and there are ways to advance see approval, see rewards that are all very specific to that company. I think the culture of a company is really defined by what are the parameters and what are the successes, the success factors for getting ahead inside of each of those different companies. I think a lot of times, especially when as a Microsoft straight out at college, I did a couple internships at Microsoft and then joining – leaving Microsoft that first time was actually really, really difficult because there is this fear of like, “Oh, my God. Everything's going to be super different.” It turns out that as you bounced around the industry a little bit, there's actually probably more alike than there is different. The biggest difference I think between large company and small company is really, and I'll throw out some science analogies here. I think, oftentimes organizations are a little bit like the ideal gas law. Okay, maybe going past y'all, but this is – PV = nRT. Pressure times volume equals number of molecules times temperature and the R is a constant. The idea here is that this is an equation where as you add more molecules to a constrained space, that will actually change the temperature and the pressure and these things all rise. What happens is inside of a large company, you end up with so many people within a constrained space in terms of the product space. When you add more people to the organization, or when you're looking to get ahead, it feels very zero-sum. It very much feels like, “Hey, for me to advance, somebody else has to lose.” That's not how the real world works, but oftentimes that's how it feels inside of the big company, is that if it feels zero-sum like that. The liberating thing for being at a startup and I think why so many people get addicted to working at startups is that startups are fundamentally not zero-sum. Everybody succeeds and fails together. When a new person shows up, your thought process is naturally like, “Awesome, we got more cylinders in the engine. We’re going to go faster,” which is not always the case inside of a big company. Now, I think as you get senior enough, all of a sudden these things changes, because you're not just operating within the confines of that company. You're actually again, playing a role in the business, you're looking at the ecosystem, you're looking at the community, you're looking at the competitive landscape and that's where you have your eye on the ball and that's what defines success for you, not the internal company metrics, but really the business metrics is what defines success for you. The thing that I'm trying to do, here at VMware now is as we do Tanzu is make sure that we recognize the unbounded possibilities in front of us inside of this world, make sure that we actually focus our energy on serving customers. In doing so, out-compete others in the market. It's not a zero-sum game, it's not something where as we bring more folks on that we feel we're competing with them. That's a little rambling of an answer. I don't know if that links together for you, Bryan. [0:08:41.8] BL: No, no. That was pretty good. [0:08:44.1] JB: Thanks. [0:08:46.6] MG: Joe, that's probably going to be a context switch now. You touched on the time when you went through the burnout phase. Then last week, I think you put out a tweet on there's so much stuff going on, which tweet I'm talking about. Yeah. In the Kubernetes community, you’re a rock star. At VMware, you're already a rock star being on stage at VMware shaking hands with Pat. I mean, there's so many people, so many e-mails, so many slacks, whatever that you get every day, but still I feel you are able to keep the balance, stay grounded and always have a chat, even though sometimes I don't want to approach you, but sometimes I do when I have some crazy questions maybe. Still you’re not pushing people away. How do you manage with mental stress preventing another burnout? What is the secret sauce here? Because I feel I need to work on that. [0:09:37.4] JB: Well, I mean it's hard. The tweet that I put out was last week I was coming back from Barcelona and tired of travel. I'm looking forward to right now, we're recording this just before KubeCon. Then after KubeCon, planning to go to re:Invent in Vegas, which is just a social denial-of-service. It's just overwhelming being with that. I was tired of traveling. I posted something and came across a little stronger than I wanted to. That I just hate people, right? I was at that point where it's just you're traveling and you just don't want to deal with anybody and every little thing is really bugging you and annoying you. I think burnout is an interesting thing. For me and I think there's different causes for different folks. Number one is that it's always fascinating when you start a new job, your calendar is empty, your responsibilities are low. Then as you are successful and you integrate yourself into the organization, all of a sudden you find that you have more work than you have time to do. Then you hit this point where you try and like, “I'm just going to keep doing it. I'm going to power through.” Then you finally hit this point where you're like, “This is just not humanly possible.” Then you go into a triage mode and then you have to decide what's important. I know that there's more to be done than I can do. I have to be very thoughtful about prioritizing what I'm doing. There's a lot of techniques that you can bring to bear there. Being explicit about what your goals are and what your priorities are, writing those things down, whether it's an OKR process, or whether it's just here's the my top three things that I'm focusing on. Making sure that those things are purposefully meaningful to you, right? Understanding the difference between urgent and important, which these are business booky type of things, but it's this idea of there are things that feel they have to get done right now and then there are things that are long-term important. If you're not thoughtful about how you do things, you spend all your time doing the urgent things, but you never get to the stuff that's the actually long-term important. That's a really easy trap to get yourself into. Finding ways to delegate to folks is really, really helpful here, in terms of empowering others, trusting them. It's hard to let go sometimes, but I think being able to set the stage for other people to be successful is really empowering. Then just recognizing it's not all going to get done and that's okay. You can't hold yourself to expect that. Now with respect to burnout, for me, the biggest driver for burnout in my career has been when I felt personal responsibility over something, but I have been had the tools, or the authority, or the ability to impact it.When you feel in your bones ownership over something, but yet you can't actually really own it, that is what causes burnout for me. I think there are studies talking about how the worst job is middle management. I think it's not being the CEO. It's not being new to the organization, being junior. It's actually being stuck in the middle. Because you're given a certain amount of responsibility, but you aren't always given the tools necessary to be able to drive that. Whereas the folks at the top, oftentimes they don't have those constraints, so they actually own stuff and have agency to be able to take care of it. I think when you're starting on more junior in the organization, the scope of ownership that you feel is relatively minor. That being stuck in the middle is the biggest driver for me for burnout. A big part of that is just recognizing that sometimes you have to take a step back and personally divest that feeling of ownership when really it's not yours to own. I'll give you an example, is that I started Google Compute Engine at Google, which is arguably the foundational cloud service for GCP. As it grew, as it became more important to Google, as it got reorged, more or more of the leadership and responsibilities and decision-making, I’m up here in Seattle, move down the mountain view, a lot of that stuff was focused at had been in the cloud market, but then at Google for 10 or 15 years coming in and they're like, “Okay, that's cute. We got it from here,” right? That was a case where it was my thing. I felt a lot of ownership over it. It was clear after a certain amount of time, hey, you know what? I just work here. I'm just doing my job and I do what I do, but really it’s these other folks that are driving the bus. That's a painful transition to actually go from that feeling of ownership to I just work here. That I think is one of the reasons why oftentimes, people leave the companies. I think that was one of the big drivers for why I ended up leaving Google, was that lack of agency to be able to impact things that I cared about quite a bit. [0:13:59.8] CC: I think that's one reason why – well, I think that working in the companies where things are moving fast, because they have a very clear, very worthwhile goal provides you the opportunity to just have so much work that you have to say no to a lot of things like where you were saying, and also take ownership of pieces of that work, because there's more work to go around than people to do it. For example, since Heptio and VM – okay, I’m plugging. This is a big plug for VMware I guess, but it definitely is a place that's moving fast. It's not crazy. It's reasonable, because everybody, pretty much, wherever one of us grown up. There is so much to do and people are glad when you take ownership of things. That really for me is a big source of work satisfaction. [0:14:51.2] JB: Yeah. I think it's that zero-sum versus positive-sum game. I think that when you – there's a lot more room for you to actually feel that ownership, have that agency, have that responsibility when you're in a positive-sum environment, versus a zero-sum environment. [0:15:04.9] BL: All right, so now I want to ask your technical question. [0:15:08.1] JB: All right. [0:15:09.5] BL: Not a really hard one. Just more of how you think about this. Kubernetes is five and almost five and a half years old. One of the key components of Kubernetes is etcd. Now knowing what we know now and 2019 with Kubernetes have you used etcd as its key store? Or would you have gone another direction? [0:15:32.1] JB: I think etcd is a good fit. The truth of the matter is that we didn't give that decision as much thought as we probably should have early on. We saw that it was relatively easy to stand up and get going with. At least on paper, it had the qualities that we were looking for, so we started building with it and then just ran with it. Something like ZooKeeper was also something we could have taken, but the operational overhead at the time of ZooKeeper was very different from etcd. I think we could have gone in the direction of them and this is why [inaudible 0:15:58.5] for a lot of their tools, where they actually build the data store into the tool in a native way. I think that can lead in some ways to a simpler getting started experience, because there's just one thing to boot up, but also it's more monolithic from a backup, maintenance, recovery type of thing. The one thing that I think we probably should have done there in retrospect is to try and create a little bit more of an arm's length relationship in Kubernetes and etcd. In terms of having some cleaner interfaces, some more contractor and stuff, so that we could have actually swapped something else out. There's folks that are doing it, so it's not impossible, but it's definitely not something that's easy to do, or well-supported. I think that that's probably the thing that I wouldn't change in that space. Another thing we might want to change, I think it might have been good to be more explicit about being able to actually shard things out, so that you could have multiple data stores for multiple resources and actually find a way to horizontally scale. Now we do that with events, because we were writing events into etcd and that's just a totally different stream of data, but everything else right now – I think now there's room to do this into the future. I think we've been able to push etcd vertically up until now. There will come a time where we need to find ways to shard that thing up horizontally. [0:17:12.0] CC: Is it possible though to use a different data store than etcd for Kubernetes? [0:17:18.4] JB: The things that I'm aware of here and there may be more and I may not be a 100% up to date, is I do know that the Azure folks created a proxy layer that speaks to the etcd protocol, but that is actually implemented on the backend using CosmoDB. That approach there was to essentially create a translation layer. Then Rancher created this project, which is a little bit if you've – been added a bit of a fork of Kubernetes, where they're I believe using PostgresSQL as the database for Kubernetes. I haven't looked to see exactly how they ended up swapping that in. My guess is that there's some chewing gum and bailing wiring and it's quite a bit of effort for each version upgrade to be able to actually adapt that moving forward. Don't know for sure. I haven't looked deeply. [0:18:06.0] CC: Okay. Now I would love to philosophize a little bit, or maybe a lot about Kubernetes. In the spirit of thinking of different questions to ask, so I had a bunch of questions and then I was thinking, “How could I ask this question in a different way?” Maybe this is not the right “question.” Here is the way I came up with this question. We’re so divided out there. One camp loves Kubernetes, another camp, "So hard, so complicated, it’s so complex. Why even bother with it? I don't understand why people are using this." Basically, there is that sentiment that Kubernetes is complicated. I don't think anybody would refute that. Now is that even the right way to talk about Kubernetes? Is it even not supposed to be complicated? I mean, what kind of a tool is it that we are thinking, it should just work, it should be just be super simple. Is it true that it should be a super simple tool to use? [0:19:09.4] JB: I mean, that's a loaded question [inaudible]. Let me just first say that number one, if people are complaining, I mean, I'm stealing this from Tim [inaudible], who I think this is the way he takes some of these things in stride. If people are complaining, then you're relevant, right? If nobody is complaining, then nobody cares about what you're doing. I think that it's a good thing that folks are taking a critical look at Kubernetes. That means that they're taking a look at it, right? For five years in, Kubernetes is on an upswing. That's not going to necessarily last forever. I think we have work to do to continually earn Kubernetes’s place in the technology stack over time. Now that being said, Kubernetes is a super, super flexible tool. It can do so many things in so many different situations. It's used from everything from in retail stores across the tens of thousands of stores, any type of solutions. People are looking at it for telco, 5G. People are looking at it to even running it inside cars, which scares me, right? Then all the way up to folks like at CERN using it to do data analytics for hiring and physics, right? The technology that I look at that's probably most comparable to that is something like Linux. Linux is actually scalable from everything from a phone, all the way up to an IBM mainframe, but it's not easy, right? I mean, to be able to adapt it across all that things, you have to essentially download the kernel type, make config and then answer 5,000 questions, right, for those who haven't done that. It's not an easy thing to do. I think that a lot of times, people might be looking at Kubernetes at the wrong level to be able to say this should be simple. Nobody looks at the Linux kernel that you get from git cloning, Linux’s fork and compiling it and saying, “Yeah, this is too hard.” Of course it's hard. It's the Linux kernel. You expect that you're going to have a curated experience if you want something easy, right? Whether that be an Android phone or Ubuntu or what have you. I think to some degree, we're still in the early days where people are dealing with it perhaps at to raw level, versus actually dealing with it in a more opinionated way. Now I think the fascinating thing for Kubernetes is that it provides a lot of the extension points and patterns, so that we don't know exactly what those higher-level easier-to-use abstractions are going to look like, but we know, or at least we're pretty confident that we have the right tools and the right environment to be able to experiment our way there. I think we're not there yet, but we're set up for success. That's the first thing. The second thing is that Kubernetes introduces a whole bunch of different concepts and ideas and these things are different and uncomfortable for folks. It's hard to learn new things. It's hard for me to learn new things and it's hard for everybody to learn new things. When you compare Kubernetes to say, getting started with the modern front-end web development stack, with things like Babel and React and how do you deploy this and what are all these different options and it changes on a weekly basis. There's a hell of a lot in common actually between these two ecosystems. They're both really hard, they both introduce all these new concepts and you have to be embedded in it to really get it. Now that being said, if you just wanted take raw JavaScript, or jQuery and have at it, you can do it and you'll see on Hacker News articles every once in a while where people are like, “Hey, I've programmed my site with jQuery and it's just fine. I don't need all this new stuff,” right? Just like you'll see folks saying like, “I just SSH’d in and actually ran some stuff and it works fine. I don't need all this Kubernetes stuff.” If that works for you, that's great. Kubernetes doesn't have to solve every problem for every person. Then the next thing is that I think that there's a lot of people who've been solving these problems again and again and again and again, but they've been solving them in their own way. It's not uncommon when you look at back-end systems, to join a company, look at what they've built and found that it's a complicated, bespoke system of chewing gum and baling wire with maybe a little bit Ansible, maybe a little bit of Puppets and bash. Everybody has built their own, complex, overwrought system to do a lot of the stuff that Kubernetes does. I think one of the values that we see here is that these things are complex, unique complex to do it, but shared complexity is more valuable than personal complexity. If we can agree on some of these concepts, then that's something that can be leveraged widely and it will fade to the background over time, versus having everybody invent their own complex system every time they need to solve these problems. With that all said, we got a ton of work to do. It's not like we're done here and I'm not going to actually sit here and say Kubernetes is easy, or that every complex thing is absolutely necessary and that we can't find ways to simplify it. We clearly can. I just think that when folks say, “Hey, I just want this to be easy." I think they're being a little bit too naïve, because it's a very difficult problem domain. [0:23:51.9] BL: I'd like to add on to that. I think about this a lot as well. Something that Joe said to me few years back, where Kubernetes is the platform for creating platforms, it is very applicable here. Where we are looking at as an industry, we need to stop looking at Kubernetes as some estimation. Your destination is really running your applications that give you pleasure, or make your business money. Kubernetes is a tool to enable us to think about our applications more, rather than the underlying ecosystem. We don't think about servers. We want to think about storage and networking, even things like finding things in your cluster. You don't think about that. Kubernetes gives it to you. If we start thinking about Kubernetes as a way to enable us to do better things, we can go back to what Joe said about Linux. Back whenever I started using Linux in the mid-90s, guess what? We compiled it. Make them big. That stuff was hard and it was slow. Now think about this, in my office I have three different Linux distributions running. You know what? I don't even think about it anymore. I don't think about configuring X. I don't think about anything. One thing that from Kubernetes is going to grow is it's going to – we're going to figure out these problems and it's going to allow us to think of these other crazy things, which is going to push the industry further. Think maybe 20 years from now if we're still running Kubernetes, who cares? It's just going to be there. We're going to think about some other problem and it could be amazing. This is good times. [0:25:18.2] JB: At one point. Sorry, the dog’s going to bark here. I mean, at one point people cared about some of the BIOS that they were running on our computers, right? That was something that you stressed out about. I mean, back in the bad old days when I was doing DOS gaming and you're like, “Oh, well this BIOS is incompatible with the –” IRQ's and all that. It's just background now. [0:25:36.7] CC: Yeah, I think about this too as a developer. I might have mentioned this before in this podcast. I have never gone from one job to another job and had to use the same deployment system. Every single job I've ever had, the deployment system is completely different, completely different set of tooling and completely different process. Just being able to walk out from one job to another job and be able to use the same platform for deployment, it must be amazing. On the flip side, being able to hire people that will join your organization already know how your deployment works, that has value in itself. It's a huge value that I don't think people talk about enough. [0:26:25.5] JB: Well honestly, this was one of the motivations for creating Kubernetes, is that I looked around Google early on and Google is really good at importing open source, circa 2000, right? This is like, “Hey, you want to use libpng, or you want to use this library, or whatever.” That was the type of open source that Google is really, really good at using. Then Google did things, like say release the Big Table paper. Then somebody went through and then created Cassandra out of it. Maybe there's some ideas in Cassandra that actually build on top of big table, or you're looking at MapReduce versus Hadoop. All of a sudden, you found that these things diverge and Google had zero ability to actually import open source, circa 2010, right? It could not back import systems, because the operational characteristics of these things were solely alien when compared to something like Borg. You see this also, like we would acquire companies and it would take those companies way too long to be able to essentially re-platform themselves on top of Borg, because it was just so different. This is one of the reasons, honestly, why we ended up doing something like GCE is to actually have a platform that was actually more familiar from acquisition. It's one of the reasons we did it. Then also introducing Kubernetes, it's not Borg. It's a cousin of Borg inside of Google. For those who don't know, Borg is the container system that’s been in production at Google for probably 15 years now, and the spiritual grandfather to Kubernetes in a lot of ways. A lot of the ideas that you learn from Kubernetes are applicable to Borg. It's not nearly as big a leap for people to actually change between them, as it was before, Kubernetes was out there. [0:27:58.6] MG: Joe, I got a similar question, because it seems to be like you're a platform builder. You've worked on GCE, Kubernetes obviously. If you would be talking to another platform architect or builder, what would be something that you would recommend to them based on your experiences? What is a key ingredient, technically speaking of a platform that you should be building today, or the main thing, or the lesson learned that you had from building those platforms, like technical advice, if you will? [0:28:26.8] JB: I mean, that's a really good question. I think in my mind, the mark of a good platform is when people can use it to do things that you hadn't imagined when you were building it, right? The goal here is that you want a platform to be a force multiplier. You wanted to enable people to do amazing things. You compare, again the Linux kernel, even something as simple as our electrical grid, right? The folks who established those standards, God knows how long ago, right? A 150 years ago or whenever, the whole Tesla versus Thomas Edison, [inaudible]. Nobody had any idea the long-term impact that would have on society over time. I think that's the definition of a successful platform in my mind. You got to keep that in mind, right? I think that for me, a lot of times people design for the first five minutes at the expense of the next five years. I've seen in a lot of times where you design for hey, I'm getting a presentation. I want to be able to fit something amazing on one slot. You do it, but then all of a sudden somebody wants to do something different. They want to go off course, they want to go off the rails, they want to actually experiment and the thing is just brittle. It's like, “Hey, it does this. It doesn't do anything else. Do you want to do something else? Sorry, this isn't the tool for you.” For me, I think that's a trap, right? Because it's easy to get it early users based on that very curated experience. It's hard to keep those users as they actually start using the thing in anger, as they start interfacing with the real world, as they deal with things that you didn't think of as a platform. I'm always thinking about how can every that you put in the platform be used in multiple ways? How can you actually make these things be composable building blocks, because then that gives you the opportunity for folks to actually compose them in ways that you didn't imagine, starting out. I think that's some of it. I started my career at Microsoft working on Internet Explorer. The fascinating thing about Microsoft is that through and through and through and through Microsoft is a platform company. It started with DOS and Windows and Office, but even though Office is viewed as a platform inside of Microsoft. They fundamentally understand in their bones the benefit of actually starting that platform flywheel. It was really interesting to actually be doing this for the first browser wars of IE versus Netscape when I started my own career, to actually see the fact that Microsoft always saw Internet Explorer as a platform, whereas I think Netscape didn't really get it in the same way, right? They didn't understand the potential, I think in the way that Microsoft did it. For me, I mean, just being where you start your career, oftentimes you actually sets your patterns in terms of how you look at things over time. I think a lot of this platform thinking comes from just imprinting when I was a baby developer, I think. I don't know. It takes a lot of time to really internalize that stuff. [0:31:14.1] BL: The lesson here is this a good one, is that when we're building things that are way bigger than us, don't think of your product as the end goal. Think of it as an enabler. When it's an enabler, that's where you get that X multiplier. Then that's where you get all the residuals. Microsoft actually is a great example of it. My gosh. Just think of what Microsoft has been able to do with the power of Office? [0:31:39.1] JB: Yeah. I look at something like VB in the Microsoft world. We still don't have VB for the cloud era. We still haven't created that. I think there's still opportunity there to actually strike. VB back in the day, for those who weren't there, struck this amazing balance of being easy to get started with, but also something that could actually grow with you over time, because it had all these extension mechanisms where you could actually – there's the marketplace controls that you could buy, you could partner with other developers that were writing C or C++. It was an incredible platform. Then they leverage to Office to extend the capabilities of VB. It's an amazing ecosystem. Sorry. I didn't mean to interrupt you, Bryan. [0:32:16.0] BL: Oh, no. That's all good. I get as excited about it as you do whenever I think about it. It's a pretty exciting place to be. [0:32:21.8] JB: Yeah. I'll talk to VC's, because I did a startup and the EIR thing and I'll have them ask me things like, “Hey, where should we invest in the Kubernetes space?” My answer is using the BS analogy like, “You got to go where the puck is going.” Invest in the things that Kubernetes enables. What are the things that people can do now that they couldn't do pre-Kubernetes? Those are the things where we're going to see the explosion of growth. It's not about the Kubernetes. It's really about a larger ecosystem that Kubernetes is the seed crystal for. [0:32:56.2] BL: For those of you listening, if you want to get anything out of here, rewind back about 20 seconds and play that over and over again, what Joe just said. [0:33:04.2] MG: Yeah. This was brilliant. [0:33:05.9] BL: It’s where the puck is going. It's not where we are now. We're building for the future. We're not building for now. [0:33:11.1] MG: I'm looking at this tweetable quotes here, the last 20 seconds, so many tweetable quotes. We have to decide which ones to tweet then. [0:33:18.5] CC: Well, we’ll tweet them all. [0:33:20.0] MG: Oh, yes. [0:33:21.3] JB: Here’s another thing. Here’s another piece of career advice. Successful people are good storytellers. You can have the most beautiful technology, if you can't tell the human story about it, about what it does for folks, then nobody will care. I spend a lot of the time on Twitter and probably too much time, if you ask my family. That medium of being able to actually distill your thoughts down into something that is tweetable, quotable, really potent, that is a skill that's worth developing and it's a skill that's worth valuing. Because there's things that are rolling around in my head and I still haven't found a way to get them into a tweet. At some point, I'll figure it out and it'll be a thing. It takes a lot of time to build that skill to be able to refine like that. [0:34:08.5] CC: I want to say an anecdote of myself. I interview a small – so tiny startup, maybe less than 10 people at the time in Cambridge back when I lived up there. The guy was borderline wanting to hire me and no. I sent him an e-mail to try to influence his decision and it was a long-ass e-mail. They said, “No, thank you.” Then I think we had a good rapport. I said, well, anything you can tell me about your decision then? He said something along the lines like, I was too verbose. That was pre-Twitter. Twitter I think existed, but it was at the very beginning, I wasn't using it. Yeah, people. Be concise. Decision-makers don't have time to read long things. You need to be able to convey your message in short sentences, few sentences. It's crucial. [0:35:07.5] BL: All right, so we're nearing the end. I want to ask another question, because these are random questions for Joe. Joe, it is the week before KubeCon North America 2019 and today is actually an interesting day. A couple of neat things happened today. We had Docker. It was neat. Docker split somewhat and it sold part of it and now they're going to be a tools company. That's neat. We're all still trying decoding what that actually is. Here's the neat piece, Apple released a laptop that can have 64 gigabytes of memory. [0:35:44.4] MG: Has an escape key. [0:35:45.7] BL: It has an escape key. [0:35:47.6] MG: This is brilliant. [0:35:48.6] BL: Yeah. I think the question was what do you think about that? [0:35:52.8] JB: Okay. Well, so first of all, I mean, Docker is fascinating and I think this is – there's a lot of lessons there and I'm not sure I'm the one to tell them. I think it's easy to armchair-quarterback these things. It's hard to live that story. I think that it's fun to play that what-if game. I think it does show that this stuff is hard. You can have everything in your grasp and then just have it all slip away. I think that's not anybody's fault. It's just there's different strategies, different approaches in how this stuff plays out over time. On the laptop thing, I think my current laptop has 16 gigs of RAM. One of the things that we're seeing is that as we move towards a microservices world, I gave a talk about this probably three or four years ago. As we move to a microservices world, I think there's one stage where you create a bunch of microservices, but you still view those things as an app. You say, "This microservice belongs to this app." Within a mature organization, those things start to grow and eventually what you find is that you have services that are actually useful for multiple apps. Your entire production infrastructure becomes this web of services that are calling each other. Apps are just entry points into these things at different points of that web of infrastructure. This is the way that things work at Google. When you see companies that are microservices-based, let's take an Uber, or Lyft or an Airbnb. As they diversify the set of products that they're offering, you know they're not running completely independent stacks. You know that there's places where these things connect to behind the scenes in a microservices world. What does that mean for developers? What it means is that you can no longer fit an entire company's worth of infrastructure on your laptop anymore. Within a certain constraint, you can go through and actually say, “Hey, I can bring up this canonical cut of microservices. I can bring that up on my laptop, but it will have dependencies that I either have to actually call into the prod dependencies, call into specialized staging, or mock those things out, so that I can actually run this thing locally and develop it.” With 64 gig of RAM, I can run more on my laptop, right? There's a little bit of kick in that can down the road in terms of okay, there's this race between more microservicey versus how much I can port on my laptop. The interesting thing is that where is this going to end? Are we going to have the ability to bring more and more with your laptop? Are you going to be able to run in the split brain thing across like there's people who will create network connections between these things? Or are we going to move to a world where you're doing more development on cluster, in the cloud and your laptop gets thinner and thinner, right? Either you absolutely need 64 gig because you're pushing up against the boundaries of what you can do on your laptop, or you've given up and it's all running in the cloud. Yet anyways, you might as well just use a Chromebook. It's fascinating that we're seeing this divergence of scaling up, versus actually moving stuff to the cloud. I can tell you at Google, a lot of folks, even developers can actually be super, super productive with something relatively thin like Chromebook, because there's so many tools there that really are targeted at doing all that stuff remotely, in Google's production data centers and such. That's I think the interesting implication from a developer point of view with 64 gigabytes of RAM. What you going to do Bryan? You're going to get the 64 gig Mac? You’re going to do it? [0:39:11.2] BL: It’s already coming. They'll be here week after next. [0:39:13.2] JB: You already ordered it? You are such an Apple fanboy. Oh, man. [0:39:18.6] BL: Oh, I'm actually so not to go too much into it. I am a fan of lots of memory. You know what? We work in this cloud native world. Any given week, I’ll work on four to five projects. I'm lazy. I don't want to shut any of them down. Now with 64 gigs, I don't have to shut anything down. [0:39:37.2] JB: It was so funny. When I was at Microsoft, everybody actually focused on Microsoft Windows boot time. They’re like, “We got to make it boot faster. We got to make it boot faster.” I'm like, I don't boot that often. I just want the thing to resume from sleep, right? If you can make that reliable on that theme. [0:39:48.7] CC: Yeah. I frequently have to restart my computer, because of memory issues. I don't want to know which app is taking up memory. I have a tool that I can look up, but I just shut it down, flush the memory. I do have a question related to Docker. Kubernetes, I don't know if it's right to say that Kubernetes is so reliant on Docker, because I know it works with other container technologies as well. In the worst case scenario, it's obviously, I have no reason to predict this, but in the worst case scenario where Docker, let's say is discontinued, how would that affect Kubernetes? [0:40:25.3] JB: Early on when we were doing Kubernetes and you're in this relationship with a company like Docker, I looked at what Docker was doing and you're like, “Okay, where is the real value here over time?” In my mind, I thought that the interface with developers that distributed kernel, that API surface area of Kubernetes, that was really the thing and that a lot of the Docker stuff was over time going to fade to the background. I think we've seen that happen, because when we talk about production systems, we definitely have moved past Docker and we have the CRI, we have Container D, which it was essentially built by Docker, donated to the CNCF as it made its way towards graduation. I think it's graduated now. The governance ties to Docker have been severed at this point. In production systems for Kubernetes, we've moved past that. I still think that there's developer experiences oftentimes reliant on Docker and things like Docker files. I think we're moving past that also. I think that if Docker were to disappear off the face of the earth, there would be some adjustment, but I think we have the right toolkits and the right systems to be able to do that. Some of that is open sourced by Docker as part of the Moby project. The whole Docker file evaluation flow is actually in this thing called Build Kit that you can actually use in different contexts outside of the Docker game. I think there's a lot of the building action. The thing that I think is the most influential thing that actually I think will stand the test of time is the Docker container image format. That artifact that you upload, that you download, the registry APIs. Now those things have been codified and are moving forward slowly under the OCI, the open container initiative project, which is a little bit of a sister foundation niche type of thing to the CNCF. I think that's the influence over time. Then related to that, I think the world should be a little bit worried about Docker Hub and what that means for Docker Hub over time, because that is not a cheap service to run. It's done as a public good, similar to github. If the commercial aspects of that are not healthy, then I think it might be disruptive if we see something bad happen with Docker Hub itself. I don't know what exactly the replacement for that would be overnight. That'd be incredibly disruptive. [0:42:35.8] CC: Should be Harbour. [0:42:37.7] JB: I mean, Harbour is a thing, but somebody's got a run it and somebody's got to pay the bandwidth bills, right? Thank you to Docker for paying those bandwidth bills, because it's actually been good for not just Docker, but for our entire ecosystem to be able to do that. I don't know what that looks like moving forward. I think it's going to be – I mean, maybe github with github artifacts and it's going to pick up the slack. We’re going to have to see. [0:42:58.6] MG: Good. I have one last question from my end. Totally different topic, not Docker at all. Or maybe, depends on your answer to it. The question is you're very technical person, what is the technology, or the stuff that your brain is currently spinning on, if you can disclose? Obviously, no secrets. What keeps you awake at night, in your brain? [0:43:20.1] JB: I mean, I think the thing that – a couple of things, is that stuff that's just completely different from our world, I think is interesting. I think we've entered at a place where programming computers, and so stuff is so specialized. That again, I talk about if you made me be a front-end developer, I would flail for several months trying to figure out how to even be productive, right? I think similar when we look at something like machine learning, there's a lot of stuff happening there really fast. I understand the broad strokes, but I can't say that I understand it to any deep degree. I think it's fascinating and exciting the amount of diversity in this world and stuff to learn. Bryan's asked me in the past. It's like, “Hey, if you're going to quit and start a new career and do something different, what would it be?” I think I would probably do something like generative art, right? Essentially, there's folks out there writing these programs to generate art, a little bit of the moral descendant of Demoscene that was I don't know. I wonder was the Demoscene happened, Bryan. When was that? [0:44:19.4] BL: Oh, mid 90s, or early 90s. [0:44:22.4] JB: That’s right. I was never super into that. I don't think I was smart enough. It's crazy stuff. [0:44:27.6] MG: I actually used to write demoscenes. [0:44:28.8] JB: I know you did. I know you did. Okay, so just for those not familiar, the Demoscene was essentially you wrote essentially X86 assembly code to do something cool on screen. It was all generated so that the amount of code was vanishingly small. It was this puzzle/art/technical tour de force type of thing. [0:44:50.8] BL: We wrote trigonometry in a similar – that's literally what we did. [0:44:56.2] JB: I think a lot of that stuff ends up being fun. Stuff that's related to our world, I think about how do we move up the stack and I think a lot of folks are focused on the developer experience, how do we make that easier. I think one of the things through the lens of VMware and Tanzu is looking at how does this stuff start to interface with organizational mechanics? How does the typical enterprise work? How do we actually make sure that we can start delivering a toolset that works with that organization, versus working against the organization? That I think is an interesting area, where it's hard because it involves people. Back-end people like programmers, they love it because they don't have to deal with those pesky people, right? They get to define their interfaces and their interfaces are pure and logical. I think that UI work, UX work, anytime when you deal with people, that's the hardest thing, because you don't get to actually tell them how to think. They tell you how to think and you have to adapt to it, which is actually different from a lot of back-end here in logical type of folks. I think there's an aspect of that that is user experience at the consumer level. There's developer experience and there's a whole class of things, which is maybe organizational experience. How do you interface with the organization, versus just interfacing, whether it's individuals in the developer, or the end-user point of view? I don't know if as an industry, we actually have our heads wrapped around that organizational limits. [0:46:16.6] CC: Well, we have arrived at the end. Makes me so sad, because we could talk for easily two more hours. [0:46:24.8] JB: Yeah, we could definitely keep going. [0:46:26.4] CC: We’re going to bring you back, Joe. Don’t worry. [0:46:28.6] JB: For sure. Anytime. [0:46:29.9] CC: Or do worry. All right, so we are going to release these episodes right after KubeCon. Glad everybody could be here today. Thank you. Make sure to subscribe and follow us on Twitter. Follow us everywhere and suggest episode topics for us. Bye and until next time. [0:46:52.3] JB: Thank you so much. [0:46:52.9] MG: Bye. [0:46:54.1] BL: Bye. Thank you. [END OF EPISODE] [0:46:55.1] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.

Diva Tech Talk Podcast
Ep 75: Scarlett Ong Rui Chern: Passion Plus Perseverance

Diva Tech Talk Podcast

Play Episode Listen Later Nov 14, 2018 41:52


Diva Tech Talk interviewed Scarlett Ong Rui Chern,  entrepreneur, founder/ CEO of Peerstachio.  Scarlett is the epitome of entrepreneurship:  courageous, persistent, agile in approach.  She grew up in a small town in Malaysia and left to pursue higher education in the United States. “I had one year of community college in Kuala Lumpur, and then transferred as a freshman into the University of Michigan (https://umich.edu/).”  As a 10-year old, Scarlett said: “I was interested in tech, especially the gaming field. I was always interested in the collaborative aspect of tech, the basis of what I am working on, now.”   Scarlett matriculated to the university, after looking for a sense of “community” among colleges. Everyone was friendly. However, “my first year was not a very good year. But I really believed in myself, although I was struggling to adapt to the whole situation,” she said. “Besides being an international student, I was a first-generation college student in my family. And I didn’t have family members around.”    Scarlett initially faltered, academically. “It was a shock to realize that I was coming from behind. Not having enough peer support, in an academic sense, brought me into a space where I felt alone, and not sure of what I was doing.”  She had to “figure out strategies of how to reach out” for help. Scarlett initially closed herself off, but now “I have grown to be a more open person, more self-aware” knowing when to ask for help.  “There are a lot of other students that face this issue, too.” Scarlett joined the university’s “optiMize Social Innovation Challenge” in freshman year. Her first project created a “gamified” classroom experience for elementary school students.  She then entered the university’s business school, with an emphasis on consulting.  “I joined a pro bono consulting club on campus.” There, she worked with the famed Zingerman’s, and created a  framework to enable past employees stay in touch with the current Zingerman’s community.   She discovered that consulting was NOT her personal life mission. “I am the kind of person who likes to get her hands dirty, make ‘hands-on’ impact, see things through,” she said. “This is when passion comes into play. Not only did I pick myself up, but I wanted to create something that would help others pick themselves up.” Scarlett discovered U of M’s Zell-Lurie Institute, dedicated to advancing knowledge and practice of innovation.  “They offer a lot of support in terms of mentorship, grant-funding”  She also joined the Michigan Venture Capital Association, as a summer intern, where she worked on the association’s 2017 annual landscape guide and “changed the game” in terms of its data collection, data collation, and interactive reporting.  She entered U of M’s CAMPUS OF THE FUTURE competition, which “reimagined” techniques and spaces for teaching and learning in the 21st century.  Scarlett and her partners tried to create a “study mentor” using artificial intelligence (AI). Her team was one of 5 finalists that had the unique opportunity to meet with Amazon’s Vice President of Development.  From this, Scarlett learned one of her key life lessons: “What makes a startup successful is not just a cool idea.  Ours was too far in advance.  There wasn’t yet a market for it.”  Scarlett shed the original team, and took the kernel of that concept as the foundation for Peerstachio, conceptually launched in September 2017. Scarlett takes both computer science classes and Udemy courses. “But I knew, deep in my heart, that even though I love tech, I wouldn’t be a professional coder. I partnered with a friend of mine, who is currently our CTO (Chief Technology Officer).”  She also recruited a UX designer to implement the “front end” of Peerstachio, while the CTO manages the back end.  Scarlett manages fundraising, the company’s overall vision, market research, marketing, competitive analysis, and sales.   Peerstachio’s main mission is to help students improve grades by connecting underclassmen with a trusted cadre of older students, to get academic questions answered in a highly responsive fashion. The MVP (Minimum Viable Product) is launched on django, html5, scss and javascript coding stack Website, and sqlite3 (PostgresSQL in future) on the back-end of the site.  Scarlett ensured that she validated real customer need by conducting detailed surveys of potential clients.  (At the time of our Diva Tech Talk interview, the Peerstachio team had interviewed 110 students, globally.)  Scarlett is becoming a highly experienced entrepreneur.  For instance, she said “If any startup tells you there is no competition, they need to do more research.”  For Peerstachio, “I feel fortunate we have competitors.  That means there is a market, and room for us to improve.” Scarlett thinks that passion, perseverance and sense of purpose are propelling her as well as empathy for the customer. “I would add one more: positivity,” she said. “A ‘growth mindset’ has kept me going.”  Scarlett’s fond hope is that she, and others like her, can show that “if you have some goals, and try to get what you desire, that is ultimate happiness.”  Her biggest fear is missing something that she could accomplish. Scarlett’s leadership lessons include: develop deep listening skills, collaborate, be confident and decisive, and lead by example. She stressed that, in a startup, “there is no such thing as 9 to 5. Work and life are intertwined.”  To balance that, Scarlett follows a “process of layers of priority” to juggle multiple goals, successfully. Many of those goals are tied to making life better for others. She also recoups energy through exercise and reading, watching videos, and reflecting.   Note:  Peerstachio has received grants since this interview including DTX, at TechTown Detroit: https://techtowndetroit.org/?press-release=u-m-startup-peerstachio-receives-inaugural-10000-gm-go-award. Make sure to check us out on online at www.divatechtalk.com on Twitter @divatechtalks, on Facebook at https://www.facebook.com/divatechalk.

Hanselminutes - Fresh Talk and Tech for Developers
Polyglot Persistence for .NET with PostgresSQL and Marten with Jeremy Miller

Hanselminutes - Fresh Talk and Tech for Developers

Play Episode Listen Later Sep 28, 2017 31:55


There's so many great open source projects and stacks to choose from in the .NET ecosystem. Scott talks to Jeremy Miller about "Marten" - it offers Polyglot Persistence for .NET Systems using the Postgresql Database as the backend. You get both a Document Database with JSON support as an Event Store! Jeremy talks about all the great options you have for persisting your objects.

TechSNAP
Episode 314: Cyber Liability | TechSNAP 314

TechSNAP

Play Episode Listen Later Apr 12, 2017 104:42


We cover some fascinating new research that can steal your phone’s PIN using just the on-board sensors. Then we cover how computer security is broken from top to bottom and Dan does another deep dive, this time on everyone’s favorite database, PostgresSQL. Plus it’s your feedback, a huge roundup & so much more!

cyber liability pin postgressql techsnap
TechSNAP Large Video
Cyber Liability | TechSNAP 314

TechSNAP Large Video

Play Episode Listen Later Apr 12, 2017 104:42


We cover some fascinating new research that can steal your phone’s PIN using just the on-board sensors. Then we cover how computer security is broken from top to bottom and Dan does another deep dive, this time on everyone’s favorite database, PostgresSQL. Plus it’s your feedback, a huge roundup & so much more!

TechSNAP Mobile Video
Cyber Liability | TechSNAP 314

TechSNAP Mobile Video

Play Episode Listen Later Apr 12, 2017 104:42


We cover some fascinating new research that can steal your phone’s PIN using just the on-board sensors. Then we cover how computer security is broken from top to bottom and Dan does another deep dive, this time on everyone’s favorite database, PostgresSQL. Plus it’s your feedback, a huge roundup & so much more!

Consultor IT
012. Bases de datos en RAM

Consultor IT

Play Episode Listen Later Jul 27, 2016 18:12


En el podcast de hoy hablaremos sobre un tema que me apasiona ¡Sobre las bases de datos en RAM! Pero antes de comenzar a explicar los beneficios que tienen las bases de datos en RAM, vamos a ver qué es una base de datos y como funciona. ¿Cómo funciona una base de datos? Puesto que a finalidad del artículo no es explicar como funcionan las bases de datos, simplemente vamos hacer un pequeño resumen para todos los lectores que están aprendiendo poco a poco informática y quieren saber la ventajas de las bases de datos en RAM frente a las convencionales. Pues bien, una base de datos, en resumen,  es una forma ordenada de clasificar la información para luego poder acceder, modificarla, crearla y eliminarla mediante unas “consultas”. De hecho, existen muchos programas para crear bases de datos y aunque la mayoría se utilizan con el mismo lenguaje (SQL), no es algo obligatorio. Por ejemplo, uno de los programas más usados a la hora de crear bases de datos es MySQL, pero existen muchos otros, como podría ser Oracle, postgresSQL, SQLite, etc. Pero bueno, en resumen ¿Cómo funciona una base de datos? Pues como hemos dicho, es una forma ordenada de clasificar información. Información a la que luego podremos acceder desde nuestros lenguaje de programación preferida, como podría ser PHP, Java, C, etc. mediante unas peticiones que se llaman consultas. Por ejemplo, cuando un programa dice “base de datos, dame toda la información del cliente Luis Peris”, la base de datos, va al fichero de la base de datos en el disco duro, recupera la información y se lo manda al programa que estamos desarrollando. Lo importante aquí, es entender que la mayoría de la información de las bases de datos como MySQL se guardan casi el 100% de las veces en el disco duro. Bases de datos en RAM Y aquí es donde aparecen los problemas y es que el disco duro es mucho más lento de lo que podemos creer, por lo que la velocidad de lectura muchas veces no es suficiente. Para entendernos mejor, un disco duro rígido (el clásico de toda la vida) tiene una velocidad entre 100 y 200 MBps. Puede parecer mucho, pero ahora fijémonos en la velocidad de los nuevos discos duros de estados solido que van entre 200 y 500 MBps. Esto nos puede parecer mucho, de hecho, mucha gente ha dicho ¡Pues nos pasamos a los discos duros de estado sólido! Y esa es la razón por la cual en muchos hostings ahora empiezan a ofrecer unicamente estos discos duros. Pero para muchos, esta velocidad no es suficiente para algunas tareas, así que se pensó en crear bases de datos que funcionaran en RAM ¿Y sabéis que velocidad tiene la RAM? Pues por ejemplo, la DDR3 puede llegar hasta los 18.000 MBps, es decir ¡18 GB por segundo! Si lo comparamos con el disco duro normal de 100MB, tendríamos que juntar la velocidad de 180 discos duros para igualar a la RAM. Y si alguien se pone a pensar ¿Por qué no se usa más este tipo de bases de datos? Pues bueno, además de que están pensadas para tareas muy específicas, hay que tener en cuenta algo ¡La RAM es carísima! Pensemos, que cuando te compras un ordenador, puede que tenga 1 Terabyte de disco duro, pero solamente tendrá 8 Gigabites. Es decir, la proporción será del 0’8% comparando disco duro contra RAM. Si la RAM fuera más barata ¿No creeis que tendriamos más? ¿Para qué usar las bases de datos en RAM? Principalmente yo veo tres motivos para usar las bases de datos en RAM: Consultas ultra-rápidas El primer motivo sería para cuando necesitamos unas velocidades muy grandes. Un ejemplo sencillo, seria pensar en cuando estamos buscando algo en Google, cada vez que presionamos una tecla nos aparecerá unas sugerencias. Para que suceda, tienes que haber una base de datos ultra rápida detrás, por lo que usar Redis por ejemplo para tener la base de datos en la RAM, seria una opción muy inteligente.  Bases de datos en RAM para el caché Sin duda, el caché es uno de los principales motivos por el cual debemos de usar este tipo de bases de datos junto a las que ya usemos. Pongamos un ejemplo, imaginad que tenéis una red social con 10 millones de usuarios activos y de media todos acceden 10 veces al día a ver las novedades. Pues como mínimo, tendríamos 100 millones de peticiones. Recordemos que en este caso, cada petición irá acompañada de una cookie, esa cookie la tendremos que usar para preguntar a nuestra base de datos (MySQL por ejemplo) si es de algún usuario y si es de algún usuario, que nos devuelva la información. Esa operación la tendremos que hacer 100 millones de veces al día. Pues bueno, con este tipo de bases de datos (Redis, memcached, etc.) nos permitirá ahorrarnos miles de millones de consultas a la base de datos al cabo del mes. Simplemente tendremos que copiar las cookies y la información básica del usuario en la RAM, así cuando recibamos una petición, consultaremos a la RAM y no al disco duro. Como es lógico, esto repercutirá en que nos ahorraremos miles de dólares al cabo del año. Investigación Las investigaciones de nuevos fármacos necesitan ultra ordenadores para procesar mucha información, en poco tiempo. Gracias a las bases de datos en RAM, pueden optimizar todavía más sus algoritmos para guardar toda la información posible en la memoria RAM. Esto permitiría eliminar el cuello de botella que se crea cuando el disco duro no puede leer tanta información como el procesador puede procesar, además también permitiría ahorrar miles de dólares al cabo del año. Resumen y conclusión Lo importante es entender que no hay bases de datos mejores o peores, lo ideal es juntar las que necesitamos para así optiminar el tiempo de carga de nuestra página web. Por último recordad que hicimos un artículo sobre Redis (un tipo de bases de datos en RAM), lo podéis leer desde aquí: Redis, base de datos en RAM. Todos los Lunes, Miércoles y Viernes, podéis escucharnos en nuestra web desde este apartado, desde ivoox y o desde iTunes. La entrada 012. Bases de datos en RAM aparece primero en Luis Peris.

Consultor IT
012. Bases de datos en RAM

Consultor IT

Play Episode Listen Later Jul 27, 2016 18:12


En el podcast de hoy hablaremos sobre un tema que me apasiona ¡Sobre las bases de datos en RAM! Pero antes de comenzar a explicar los beneficios que tienen las bases de datos en RAM, vamos a ver qué es una base de datos y como funciona. ¿Cómo funciona una base de datos? Puesto que a finalidad del artículo no es explicar como funcionan las bases de datos, simplemente vamos hacer un pequeño resumen para todos los lectores que están aprendiendo poco a poco informática y quieren saber la ventajas de las bases de datos en RAM frente a las convencionales. Pues bien, una base de datos, en resumen,  es una forma ordenada de clasificar la información para luego poder acceder, modificarla, crearla y eliminarla mediante unas “consultas”. De hecho, existen muchos programas para crear bases de datos y aunque la mayoría se utilizan con el mismo lenguaje (SQL), no es algo obligatorio. Por ejemplo, uno de los programas más usados a la hora de crear bases de datos es MySQL, pero existen muchos otros, como podría ser Oracle, postgresSQL, SQLite, etc. Pero bueno, en resumen ¿Cómo funciona una base de datos? Pues como hemos dicho, es una forma ordenada de clasificar información. Información a la que luego podremos acceder desde nuestros lenguaje de programación preferida, como podría ser PHP, Java, C, etc. mediante unas peticiones que se llaman consultas. Por ejemplo, cuando un programa dice “base de datos, dame toda la información del cliente Luis Peris”, la base de datos, va al fichero de la base de datos en el disco duro, recupera la información y se lo manda al programa que estamos desarrollando. Lo importante aquí, es entender que la mayoría de la información de las bases de datos como MySQL se guardan casi el 100% de las veces en el disco duro. Bases de datos en RAM Y aquí es donde aparecen los problemas y es que el disco duro es mucho más lento de lo que podemos creer, por lo que la velocidad de lectura muchas veces no es suficiente. Para entendernos mejor, un disco duro rígido (el clásico de toda la vida) tiene una velocidad entre 100 y 200 MBps. Puede parecer mucho, pero ahora fijémonos en la velocidad de los nuevos discos duros de estados solido que van entre 200 y 500 MBps. Esto nos puede parecer mucho, de hecho, mucha gente ha dicho ¡Pues nos pasamos a los discos duros de estado sólido! Y esa es la razón por la cual en muchos hostings ahora empiezan a ofrecer unicamente estos discos duros. Pero para muchos, esta velocidad no es suficiente para algunas tareas, así que se pensó en crear bases de datos que funcionaran en RAM ¿Y sabéis que velocidad tiene la RAM? Pues por ejemplo, la DDR3 puede llegar hasta los 18.000 MBps, es decir ¡18 GB por segundo! Si lo comparamos con el disco duro normal de 100MB, tendríamos que juntar la velocidad de 180 discos duros para igualar a la RAM. Y si alguien se pone a pensar ¿Por qué no se usa más este tipo de bases de datos? Pues bueno, además de que están pensadas para tareas muy específicas, hay que tener en cuenta algo ¡La RAM es carísima! Pensemos, que cuando te compras un ordenador, puede que tenga 1 Terabyte de disco duro, pero solamente tendrá 8 Gigabites. Es decir, la proporción será del 0’8% comparando disco duro contra RAM. Si la RAM fuera más barata ¿No creeis que tendriamos más? ¿Para qué usar las bases de datos en RAM? Principalmente yo veo tres motivos para usar las bases de datos en RAM: Consultas ultra-rápidas El primer motivo sería para cuando necesitamos unas velocidades muy grandes. Un ejemplo sencillo, seria pensar en cuando estamos buscando algo en Google, cada vez que presionamos una tecla nos aparecerá unas sugerencias. Para que suceda, tienes que haber una base de datos ultra rápida detrás, por lo que usar Redis por ejemplo para tener la base de datos en la RAM, seria una opción muy inteligente.  Bases de datos en RAM para el caché Sin duda, el caché es uno de los principales motivos por el cual debemos de usar este tipo de bases de datos junto a las que ya usemos. Pongamos un ejemplo, imaginad que tenéis una red social con 10 millones de usuarios activos y de media todos acceden 10 veces al día a ver las novedades. Pues como mínimo, tendríamos 100 millones de peticiones. Recordemos que en este caso, cada petición irá acompañada de una cookie, esa cookie la tendremos que usar para preguntar a nuestra base de datos (MySQL por ejemplo) si es de algún usuario y si es de algún usuario, que nos devuelva la información. Esa operación la tendremos que hacer 100 millones de veces al día. Pues bueno, con este tipo de bases de datos (Redis, memcached, etc.) nos permitirá ahorrarnos miles de millones de consultas a la base de datos al cabo del mes. Simplemente tendremos que copiar las cookies y la información básica del usuario en la RAM, así cuando recibamos una petición, consultaremos a la RAM y no al disco duro. Como es lógico, esto repercutirá en que nos ahorraremos miles de dólares al cabo del año. Investigación Las investigaciones de nuevos fármacos necesitan ultra ordenadores para procesar mucha información, en poco tiempo. Gracias a las bases de datos en RAM, pueden optimizar todavía más sus algoritmos para guardar toda la información posible en la memoria RAM. Esto permitiría eliminar el cuello de botella que se crea cuando el disco duro no puede leer tanta información como el procesador puede procesar, además también permitiría ahorrar miles de dólares al cabo del año. Resumen y conclusión Lo importante es entender que no hay bases de datos mejores o peores, lo ideal es juntar las que necesitamos para así optiminar el tiempo de carga de nuestra página web. Por último recordad que hicimos un artículo sobre Redis (un tipo de bases de datos en RAM), lo podéis leer desde aquí: Redis, base de datos en RAM. Todos los Lunes, Miércoles y Viernes, podéis escucharnos en nuestra web desde este apartado, desde ivoox y o desde iTunes. La entrada 012. Bases de datos en RAM aparece primero en Luis Peris.