Podcasts about BigQuery

  • 103PODCASTS
  • 280EPISODES
  • 40mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • Mar 7, 2023LATEST

POPULARITY

20152016201720182019202020212022


Best podcasts about BigQuery

Latest podcast episodes about BigQuery

Screaming in the Cloud
The Realities of Working in Data with Emily Gorcenski

Screaming in the Cloud

Play Episode Listen Later Mar 7, 2023 36:22


Emily Gorcenski, Data & AI Service Line Lead at Thoughtworks, joins Corey on Screaming in the Cloud to discuss how big data is changing our lives - both for the better, and the challenges that come with it. Emily explains how data is only important if you know what to do with it and have a plan to work with it, and why it's crucial to understand the use-by date on your data. Corey and Emily also discuss how big data problems aren't universal problems for the rest of the data community, how to address the ethics around AI, and the barriers to entry when pursuing a career in data. About EmilyEmily Gorcenski is a principal data scientist and the Data & AI Service Line Lead of ThoughtWorks Germany. Her background in computational mathematics and control systems engineering has given her the opportunity to work on data analysis and signal processing problems from a variety of complex and data intensive industries. In addition, she is a renowned data activist and has contributed to award-winning journalism through her use of data to combat extremist violence and terrorism. The opinions expressed are solely her own.Links Referenced: ThoughtWorks: https://www.thoughtworks.com/ Personal website: https://emilygorcenski.com Twitter: https://twitter.com/EmilyGorcenski Mastodon: https://mastodon.green/@emilygorcenski@indieweb.social TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Emily Gorcenski, who is the Data and AI Service Line Lead over at ThoughtWorks. Emily, thank you so much for joining me today. I appreciate it.Emily: Thank you for having me. I'm happy to be here.Corey: What is it you do, exactly? Take it away.Emily: Yeah, so I run the data side of our business at ThoughtWorks, Germany. That means data engineering work, data platform work, data science work. I'm a data scientist by training. And you know, we're a consulting company, so I'm working with clients and trying to help them through the, sort of, morphing landscape that data is these days. You know, should we be migrating to the cloud with our data? What can we migrate to the cloud with our data? Where should we be doing with our data scientists and how do we make our data analysts' lives easier? So, it's a lot of questions like that and trying to figure out the strategy and all of those things.Corey: You might be one of the most perfectly positioned people to ask this question to because one of the challenges that I've run into consistently and persistently—because I watch a lot of AWS keynotes—is that they always come up with the same talking point, that data is effectively the modern gold. And data is what unlocks value to your busin—“Every business agrees,” because someone who's dressed in what they think is a nice suit on stage is saying that it's, “Okay, you're trying to sell me something. What's the deal here?” Then I check my email and I discover that Amazon has sent me the same email about the same problem for every region I've deployed things to in AWS. And, “Oh, you deploy this to one of the Japanese regions. We're going to send that to you in Japanese as a result.”And it's like, okay, for a company that says data is important, they have no idea who any of their customers are at this point, is that is the takeaway here. How real is, “Data is important,” versus, “We charge by the gigabyte so you should save all of your data and then run expensive things on top of it.”Emily: I think data is very important, if you know what you're going to do with it and if you have a plan for how to work with it. I think if you look at the history of computing, of technology, if you go back 20 years to maybe the early days of the big data era, right? Everyone's like, “Oh, we've got big data. Data is going to be big.” And for some reason, we never questioned why, like, we were thinking that the ‘big' in ‘big data' meant big is in volume and not ‘big' as in ‘big pharma.'This sort of revolution never really happened for most companies. Sure, some companies got a lot of value from the, sort of, data mining and just gather everything and collect everything and if you hit it with a big computational hammer, insights will come out and somehow there's insights will make you money through magic. The reality is much more prosaic. If you want to make money with data, you have to have a plan for what you're going to do with data. You have to know what you're looking for and you have to know exactly what you're going to get when you look at your data and when you try to answer questions with it.And so, when we see somebody like Amazon not being able to correlate that the fact that you're the account owner for all of these different accounts and that the language should be English and all of these things, that's part of the operational problem because it's annoying, to try to do joins across multiple tables in multiple regions and all of those things, but it's also part—you know, nobody has figured out how this adds value for them to do that, right? There's a part of it where it's like, this is just professionalism, but there's a part of it, where it's also like… whatever. You've got Google Translate. Figure out yourself. We're just going to get through it.I think that… as time has evolved from the initial waves of the big data era into the data science era, and now we're in, you know, all sorts of different architectures and principles and all of these things, most companies still haven't figured out what to do with data, right? They're still investing a ton of money to answer the same analytics questions that they were answering 20 years ago. And for me, I think that's a disappointment in some regards because we do have better tools now. We can do so many more interesting things if you give people the opportunity.Corey: One of the things that always seemed a little odd was, back when I wielded root credentials in anger—anger,' of course, being my name for the production environment, as opposed to, “Theory,” which is what I call staging because it works in theory, but not in production. I digress—it always felt like I was getting constant pushback from folks of, “You can't delete that data. It's incredibly important because one day, we're going to find a way to unlock the magic of it.” And it's, “These are web server logs that are 15 years old, and 98% of them by volume are load balancer health checks because it turns out that back in those days, baby seals got more hits than our website did, so that's not really a thing that we wind up—that's going to add much value to it.” And then from my perspective, at least, given that I tend to live, eat, sleep, breathe cloud these days, AWS did something that was refreshingly customer-obsessed when they came out with Glacier Deep Archive.Because the economics of that are if you want to store a petabyte of data, with a 12-hour latency on request for things like archival logs and whatnot, it's $1,000 a month per petabyte, which is okay, you have now hit a price point where it is no longer worth my time to argue with you. We're just not going to delete anything ever again. Problem solved. Then came GDPR, which is neither here nor there and we actually want to get rid of those things for a variety of excellent legal reasons. And the dance continues.But my argument against getting rid of data because it's super expensive no longer holds water in the way that it wants did for anything remotely resembling a reasonable amount of data. Then again, that's getting reinvented all the time. I used to be very, I guess we'll call it, I guess, a data minimalist. I don't want to store a bunch of data, mostly because I'm not a data person. I am very bad thinking in that way.I consider SQL to be the chests of the programming world and I'm not particularly great at it. And I also unlucky and have an aura, so if I destroy a bunch of stateless web servers, okay, we can all laugh about that, but let's keep me the hell away from the data warehouse if we still want a company tomorrow morning. And that was sort of my experience. And I understand my bias in that direction. But I'm starting to see magic get unlocked.Emily: Yeah, I think, you know, you said earlier, there's, like, this mindset, like, data is the new gold or data is new oil or whatever. And I think it's actually more true that data is the new milk, right? It goes bad if you don't use it, you know, before a certain point in time. And at a certain point in time, it's not going to be very offensive if you just leave it locked in the jug, but as soon as you try to open it, you're going to have a lot of problems. Data is very, very cheap to store these days. It's very easy to hold data; it's very expensive to process data.And I think that's where the shift has gone, right? There's sort of this, like, Oracle DBA legacy of, like, “Don't let the software developers touch the prod database.” And they've kind of kept their, like, arcane witchcraft to themselves, and that mindset has persisted. But now it's sort of shifted into all of these other architectural patterns that are just abstractions on top of this, don't let the software engineers touch the data store, right? So, we have these, like, streaming-first architectures, which are great. They're great for software devs. They're great for software devs. And they're great for data engineers who like to play with big powerful technology.They're terrible if you want to answer a question, like, “How many customers that I have yesterday?” And these are the things that I think are some of the central challenges, right? A Kappa architecture—you know, streaming-first architecture—is amazing if you want to improve your application developer throughput. And it's amazing if you want to build real-time analytics or streaming analytics into your platform. But it's terrible if you want your data lake to be navigable. It's terrible if you want to find the right data that makes sense to do the more complex things. And it becomes very expensive to try to process it.Corey: One of the problems I think I have that is that if I take a look at the data volumes that I work with in my day-to-day job, I'm dealing with AWS billing data as spit out by the AWS billing system. And there isn't really a big data problem here. If you take a look at some of the larger clients, okay, maybe I'm trying to consume a CSV that's ten gigabytes. Yes, Excel is going to violently scream itself to death if I try to wind up loading it there, and then my computer smells like burning metal all afternoon. But if it fits in RAM, it doesn't really feel like it's a big data problem, on some level.And it just feels that when I look at the landscape of all the different tools you can use for things like this, they just feel like it's more or less, hmm, “I have a loose thread on my shirt. Could you pass me that chainsaw for a second?” It just seems like stupendous overkill for anything that I'm working with. Counterpoint; that the clients I'm working with have massive data farms and my default response when I meet someone who's very good at an area that I don't do a lot of work in is—counterintuitively to what a lot of people apparently do on Twitter—is not the default assumption of oh, “I don't know anything about that space. It must be worthless and they must be dumb.”No. That is not the default approach to take anything, from my perspective. So, it's clear there's something very much there that I just don't see slash understand. That is a very roundabout way of saying what could be uncharitably distilled down to, “So, is your entire career bullshit?” But no, it is clearly not.There is value being extracted from this and it's powerful. I just think that there's been an industry-wide, relatively poor job done of explaining that value in ways that don't come across as contrived or profoundly disturbing.Emily: Yeah, I think there's a ton of value in doing things right. It gets very complicated to try to explain the nuances of when and how data can actually be useful, right? Oftentimes, your historical data, you know, it really only tells you about what happened in the past. And you can throw some great mathematics at it and try to use it to predict the future in some sense, but it's not necessarily great at what happens when you hit really hard changes, right?For example, when the Coronavirus pandemic hit and purchaser and consumer behavior changed overnight. There was no data in the data set that explained that consumer behavior. And so, what you saw is a lot of these things like supply chain issues, which are very heavily data-driven on a normal circumstance, there was nothing in that data that allowed those algorithms to optimize for the reality that we were seeing at that scale, right? Even if you look at advanced logistics companies, they know what to do when there's a hurricane coming or when there's been an earthquake or things like that. They have disaster scenarios.But nobody has ever done anything like this at the global scale, right? And so, what we saw was this hard reset that we're still feeling the repercussions of today. Yes, there were people who couldn't work and we had lockdowns and all that stuff, but we also have an effect from the impact of the way that we built the systems to work with the data that we need to shuffle around. And so, I think that there is value in being able to process these really, really large datasets, but I think that actually, there's also a lot of value in being able to solve smaller, simpler problems, right? Not everything is a big data problem, not everything requires a ton of data to solve.It's more about the mindset that you use to look at the data, to explore the data, and what you're doing with it. And I think the challenge here is that, you know, everyone wants to believe that they have a big data problem because it feels like you have to have a big data problem if you—Corey: All the cool kids are having this kind of problem.Emily: You have to have big data to sit at the grownup's table. And so, what's happened is we've optimized a lot of tools around solving big data problems and oftentimes, these tools are really poor at solving normal data problems. And there's a lot of money being spent in a lot of overkill engineering in the data space.Corey: On some level, it feels like there has been a dramatic misrepresentation of this. I had an article that went out last year where I called machine-learning selling pickaxes into a digital gold rush. And someone I know at AWS responded to that and probably the best way possible—she works over on their machine-learning group—she sent me a foam Minecraft pickaxe that now is hanging on my office wall. And that gets more commentary than anything, including the customized oil painting I have of Billy the Platypus fighting an AWS Billing Dragon. No, people want to talk about the Minecraft pickaxe.It's amazing. It's first, where is this creativity in any of the marketing that this department is putting out? But two it's clearly not accurate. And what it took for me to see that was a couple of things that I built myself. I built a Twitter thread client that would create Twitter threads, back when Twitter was a place that wasn't overrun by some of the worst people in the world and turned into BirdChan.But that was great. It would automatically do OCR on images that I uploaded, it would describe the image to you using Azure's Cognitive Vision API. And that was magic. And now I see things like ChatGPT, and that's magic. But you take a look at the way that the cloud companies have been describing the power of machine learning in AI, they wind up getting someone with a doctorate whose first language is math getting on stage for 45 minutes and just yelling at you in Star Trek technobabble to the point where you have no idea what the hell they're saying.And occasionally other data scientists say, “Yeah, I think he's just shining everyone on at this point. But yeah, okay.” It still becomes unclear. It takes seeing the value of it for it to finally click. People make fun of it, but the Hot Dog, Not A Hot Dog app is the kind of valuable breakthrough that suddenly makes this intangible thing very real for people.Emily: I think there's a lot of impressive stuff and ChatGPT is fantastically impressive. I actually used ChatGPT to write a letter to some German government agency to deal with some bureaucracy. It was amazing. It did it, was grammatically correct, it got me what I needed, and it saved me a ton of time. I think that these tools are really, really powerful.Now, the thing is, not every company needs to build its own ChatGPT. Maybe they need to integrate it, maybe there's an application for it somewhere in their landscape of product, in their landscape of services, in the landscape of their interim internal tooling. And I would be thrilled actually to see some of that be brought into reality in the next couple of years. But you also have to remember that ChatGPT is not something that came because we have, like, a really great breakthrough in AI last year or something like that. It stacked upon 40 years of research.We've gone through three new waves of neural networking in that time to get to this point, and it solves one class of problem, which is honestly a fairly narrow class of problem. And so, what I see is a lot of companies that have much more mundane problems, but where data can actually still really help them. Like how do you process Cambodian driver's licenses with OCR, right? These are the types of things that if you had a training data set that was every Cambodian person's driver's license for the last ten years, you're still not going to get the data volumes that even a day worth of Amazon's marketplace generates, right? And so, you need to be able to solve these problems still with data without resorting to the cudgel that is a big data solution, right?So, there's still a niche, a valuable niche, for solving problems with data without having to necessarily resort to, we have to load the entire internet into our stream and throw GPUs at it all day long and spend hundreds of—tens of millions of dollars in training. I don't know, maybe hundreds of millions; however much ChatGPT just raised. There's an in-between that I think is vastly underserved by what people are talking about these days.Corey: There is so much attention being given to this and it feels almost like there has been a concerted and defined effort to almost talk in circles and remove people from the humanity and the human consequences of what it is that they're doing. When I was younger, in my more reckless years, I was never much of a fan of the idea of government regulation. But now it has become abundantly clear that our industry, regardless of how you want to define industry, how—describe a society—cannot self-regulate when it comes to data that has the potential to ruin people's lives. I mean, I spent a fair bit of my time in my career working in financial services in a bunch of different ways. And at least in those jobs, it was only money.The scariest thing I ever dealt with, from a data perspective is when I did a brief stint at Grindr because that was the sort of problem where if that data gets out, people will die. And I have not had to think about things like that have that level of import before or since, for which I'm eternally grateful. “It's only money,” which is a weird thing for a guy who fixes cloud bills for a living to say. And if I say that in a client call, it's not going to go very well. But it's the truth. Money is one of those things that can be fixed. It can be addressed in due course. There are always opportunities there. Someone just been outed to their friends, family, and they feel their life is now in shambles around them, you can't unring that particular bell.Emily: Yeah. And in some countries, it can lead to imprisonment, or—Corey: It can lead to death sentences, yes. It's absolutely not acceptable.Emily: There's a lot to say about the ethics of where we are. And I think that as a lot of these high profile, you know, AI tools have come out over the last year or so, so you know, Stable Diffusion and ChatGPT and all of this stuff, there's been a lot of conversation that is sort of trying to put some counterbalance on what we're seeing. And I don't know that it's going to be successful. I think that, you know, I've been speaking about ethics and technology for a long time and I think that we need to mature and get to the next level of actually addressing the ethical problems in technology. Because it's so far beyond things like, “Oh, you know, if there's a biased training data set and therefore the algorithm is biased,” right?Everyone knows that by now, right? And the people who don't know that, don't care. We need to get much beyond where, you know, these conversations about ethics and technology are going because it's a manifold problem. We have issues with the people labeling this data are paid, you know, pennies per hour to deal with some of the most horrific content you've ever seen. I mean, I'm somebody who has immersed myself in a lot of horrific content for some of the work that I have done, and this is, you know, so far beyond what I've had to deal with in my life that I can't even imagine it. You couldn't pay me enough money to do it and we're paying people in developing nations, you know, a buck-thirty-five an hour to do this. I think—Corey: But you must understand, Emily, that given the standard of living where they are, that that is perfectly normal and we wouldn't want to distort local market dynamics. So, if they make a buck-fifty a day, we are going to be generous gods and pay them a whopping dollar-seventy a day, and now we feel good about ourselves. And no, it's not about exploitation. It's about raising up an emerging market. And other happy horseshit that lies people tell themselves.Emily: Yes, it is. Yes, it is. And we've built—you know, the industry has built its back on that. It's raised itself up on this type of labor. It's raised itself up on taking texts and images without permission of the creators. And, you know, there's—I'm not a lawyer and I'm not going to play one, but I do know that derivative use is something that at least under American law, is something that can be safely done. It would be a bad world if derivative use was not something that we had freely available, I think, and on the balance.But our laws, the thing is, our laws don't account for the scale. Our laws about things like fair use, derivative use, are for if you see a picture and you want to take your own interpretation, or if you see an image and you want to make a parody, right? It's a one-to-one thing. You can't make 5 million parody images based on somebody's art, yourself. These laws were never built for this scale.And so, I think that where AI is exploiting society is it's exploiting a set of ethics, a set of laws, and a set of morals that are built around a set of behavior that is designed around normal human interaction scales, you know, one person standing in front of a lecture hall or friends talking with each other or things like that. The world was not meant for a single person to be able to speak to hundreds of thousands of people or to manipulate hundreds of thousands of images per day. It's actually—I find it terrifying. Like, the fact that me, a normal person, has a Twitter following that, you know, if I wanted to, I can have 50 million impressions in a month. This is not a normal thing for a normal human being to have.And so, I think that as we build this technology, we have to also say, we're changing the landscape of human ethics by our ability to act at scale. And yes, you're right. Regulation is possibly one way that can help this, but I think that we also need to embed cultural values in how we're using the technology and how we're shaping our businesses to use the technology. It can be used responsibly. I mean, like I said, ChatGPT helped me with a visa issue, sending an email to the immigration office in Berlin. That's a fantastic thing. That's a net positive for me; hopefully, for humanity. I wasn't about to pay a lawyer to do it. But where's the balance, right? And it's a complex topic.Corey: It is. It absolutely is. There is one last topic that I would like to talk to you about that's a little less heavy. And I've got to be direct with you that I'm not trying to be unkind, but you've disappointed me. Because you mentioned to me at one point, when I asked how things were going in your AWS universe, you said, “Well, aside from the bank heist, reasonably well.”And I thought that you were blessed as with something I always look for, which is the gift of glorious metaphor. Unfortunately, as I said, you've disappointed me. It was not a metaphor; it was the literal truth. What the hell kind of bank heist could possibly affect an AWS account? This sounds like something out of a movie. Hit me with it.Emily: Yeah, you know, I think in the SRE world, we tell people to focus on the high probability, low impact things because that's where it's going to really hurt your business, and let the experts deal with the black swan events because they're pretty unlikely. You know, a normal business doesn't have to worry about terrorists breaking into the Google data center or a gang of thieves breaking into a bank vault. Apparently, that is something that I have to worry about because I have some data in my personal life that I needed to protect, like all other people. And I decided, like a reasonable and secure and smart human being who has a little bit of extra spending cash that I would do the safer thing and take my backup hard drive and my old phones and put them in a safety deposit box at an old private bank that has, you know, a vault that's behind the meter-and-a-half thick steel door and has two guards all the time, cameras everywhere. And I said, “What is the safest possible thing that you can do to store your backups?” Obviously, you put it in a secure storage location, right? And then, you know, I don't use my AWS account, my personal AWS account so much anymore. I have work accounts. I have test accounts—Corey: Oh, yeah. It's honestly the best way to have an AWS account is just having someone else having a payment instrument attached to it because otherwise oh God, you're on the hook for that yourself and nobody wants that.Emily: Absolutely. And you know, creating new email addresses for new trial accounts is really just a pain in the ass. So, you know, I have my phone, you know, from five years ago, sitting in this bank vault and I figured that was pretty secure. Until I got an email [laugh] from the Berlin Polizei saying, “There has been a break-in.” And I went and I looked at the news and apparently, a gang of thieves has pulled off the most epic heist in recent European history.This is barely in the news. Like, unless you speak German, you're probably not going to find any news about this. But a gang of thieves broke into this bank vault and broke open the safety deposit boxes. And it turns out that this vault was also the location where a luxury watch consigner had been storing his watches. So, they made off with some, like, tens of millions of dollars of luxury watches. And then also the phone that had my 2FA for my Amazon account. So, the total value, you know, potential theft of this was probably somewhere in the $500 million range if they set up a SageMaker instance on my account, perhaps.Corey: This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your. Engineers. Are. Burned. Out. They're tired from pagers waking them up at 2 am for something that could have waited until after their morning coffee. Ring Ring, Who's There? It's Nagios, the original call of duty! They're fed up with relying on two or three different “monitoring tools” that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there's a better way. Observability tools like Honeycomb (and very little else becau se they do admittedly set the bar) show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and, most importantly, great for your customers. Try FREE today at honeycomb.io/screaminginthecloud. That's honeycomb.io/screaminginthecloud.Corey: The really annoying part that you are going to kick yourself on about this—and I'm not kidding—is, I've looked up the news articles on this event and it happened, something like two or three days after AWS put out the best release of last years, or any other re:Invent—past, present, future—which is finally allowing multiple MFA devices on root accounts. So finally, we can stop having safes with these things or you can have two devices or you can have multiple people in Covid times out of remote sides of different parts of the world and still get into the thing. But until then, nope. It's either no MFA or you have to store it somewhere ridiculous like that and access becomes a freaking problem in the event that the device is lost, or in this case stolen.Emily: [laugh]. I will just beg the thieves, if you're out there, if you're secretly actually a bunch of cloud engineers who needed to break into a luxury watch consignment storage vault so that you can pay your cloud bills, please have mercy on my poor AWS account. But also I'll tell you that the credit card attached to it is expired so you won't have any luck.Corey: Yeah. Really sad part. Despite having the unexpired credit card, it just means that the charge won't go through. They're still going to hold you responsible for it. It's the worst advice I see people—Emily: [laugh].Corey: Well, intentioned—giving each other on places like Reddit where the other children hang out. And it's, “Oh, just use a prepaid gift card so it can only charge you so much.” It's yeah, and then you get exploited like someone recently was and start accruing $60,000 a day in Lambda charges on an otherwise idle account and Amazon will come after you with a straight face after a week. And, like, “Yes, we'd like our $360,000, please.”Emily: Yes.Corey: “We tried to charge the credit card and wouldn't you know, it expired. Could you get on that please? We'd like our money faster if you wouldn't mind.” And then you wind up in absolute hell. Now, credit where due, they in every case I am aware of that is not looking like fraud's close cousin, they have made it right, on some level. But it takes three weeks of back and forth and interminable waiting.And you're sitting there freaking out, especially if you're someone who does not have a spare half-million dollars sitting around. Imagine who—“You sound poor. Have you tried not being that?” And I'm firmly convinced that it a matter of time until someone does something truly tragic because they don't understand that it takes forever, but it will go away. And from my perspective, there's no bigger problem that AWS needs to fix than surprise lifelong earnings bills to some poor freaking student who is just trying to stand up a website as part of a class.Emily: All of the clouds have these missing stairs in them. And it's really easy because they make it—one of the things that a lot of the cloud providers do is they make it really easy for you to spin up things to test them. And they make it really, really hard to find where it is to shut it all down. The data science is awful at this. As a data scientist, I work with a lot of data science tools, and every cloud has, like, the spin up your magical data science computing environment so that your data scientist can, like, bang on the data with you know, high-performance compute for a while.And you know, it's one click of a button and you type in a couple of na—you know, a couple of things name, your service or whatever, name your resource. You click a couple buttons and you spin it up, but behind the scenes, it's setting up a Kubernetes cluster and it's setting up some storage bucket and it's setting up some data pipelines and it's setting up some monitoring stuff and it's setting up a VM in order to run all of this stuff. And the next thing that you know, you're burning 100, 200 euro a day, just to, like, to figure out if you can load a CSV into pandas using a Jupyter Notebook. And you're like—when you try to shut it all down, you can't. It's you have to figure, oh, there is a networking thing set up. Well, nobody told me there's a networking thing set up. You know? How do I delete that?Corey: You didn't say please, so here you go. Without for me, it's not even the giant bill going from $4 a month in S3 charges to half a million bucks because that is pretty obvious from the outside just what the hell's been happening. It's the little stuff. I am still—since last summer—waiting for a refund on $260 of ‘because we said so' SageMaker credits because of a change of their billing system, for a 45-minute experiment I had done eight months before that.Emily: Yep.Corey: Wild stuff. Wild stuff. And I have no tolerance for people saying, “Oh, you should just read the pricing page and understand it better.” Yeah, listen, jackhole. I do this for a living. If I can fall victim to it, anyone can. I promise. It is not that I don't know how the billing system works and what to do to avoid unexpected charges.And I'm just luck—because if I hadn't caught it with my systems three days into the month, it would have been a $2,000 surprise. And yeah, I run a company. I can live with that. I wouldn't be happy, but whatever. It is immaterial compared to, you know, payroll.Emily: I think it's kind of a rite of passage, you know, to have the $150 surprise Redshift bill at the end of the month from your personal test account. And it's sad, you know? I think that there's so much better that they can do and that they should do. Sort of as a tangent, one of the challenges that I see in the data space is that it's so hard to break into data because the tooling is so complex and it requires so much extra knowledge, right? If you want to become a software developer, you can develop a microservice on your machine, you can build a web app on your machine, you can set up Ruby on Rails, or Flask, or you know, .NET, or whatever you want. And you can do all of that locally.And you can learn everything you need to know about React, or Terraform, or whatever, running locally. You can't do that with data stuff. You can't do that with BigQuery. You can't do that with Redshift. The only way that you can learn this stuff is if you have an account with that setup and you're paying the money to execute on it. And that makes it a really high barrier for entry for anyone to get into this space. It makes it really hard to learn. Because if you want to learn anything by doing, like many of us in the industry have done, it's going to cost you a ton of money just to [BLEEP] around and find out.Corey: Yes. And no one likes the find out part of those stories.Emily: Nobody likes to find out when it comes to your bill.Corey: And to tie it back to the data story of it, it is clearly some form of batch processing because it tries to be an eight-hour consistency model. Yeah, I assume for everything, it's 72. But what that means is that you are significantly far removed from doing a thing and finding out what that thing costs. And that's the direct charges. There's always the oh, I'm going to set things up and it isn't going to screw you over on the bill. You're just planting a beautiful landmine you're going to stumble blindly into in three months when you do something else and didn't realize what that means.And the worst part is it feels victim-blamey. I mean, this is my pro—I guess this is one of the reasons I guess I'm so down on data, even now. It's because I contextualize it in a sense of the AWS bill. No one's happy dealing with that. You ever met a happy accountant? You have not.Emily: Nope. Nope [laugh]. Especially when it comes to clouds stuff.Corey: Oh yeah.Emily: Especially these days, when we're all looking to save energy, save money in the cloud.Corey: Ideally, save the planet. Sustainability and saving money align on the axis of ‘turn that shit off.' It's great. We can hope for a brighter tomorrow.Emily: Yep.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where can they find you? Apparently filing police reports after bank heists, which you know, it's a great place to meet people.Emily: Yeah. You know, the largest criminal act in Berlin is certainly a place you want to go to get your cloud advice. You can find me, I have a website. It's my name, emilygorcenski.com.You can find me on Twitter, but I don't really post there anymore. And I'm on Mastodon at some place because Mastodon is weird and kind of a mess. But if you search me, I'm really not that hard to find. My name is harder to spell, but you'll see it in the podcast description.Corey: And we will, of course, put links to all of this in the show notes. Thank you so much for your time. I really appreciate it.Emily: Thank you for having me.Corey: Emily Gorcenski, Data and AI Service Line Lead at ThoughtWorks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insipid, insulting comment, talking about why data doesn't actually matter at all. And then the comment will disappear into the ether because your podcast platform of choice feels the same way about your crappy comment.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

The Stack Overflow Podcast
Shorten the distance between production data and insight

The Stack Overflow Podcast

Play Episode Listen Later Feb 22, 2023 20:27


Modern networked applications generate a lot of data, and every business wants to make the most of that data. Most of the time, that means moving production data through some transformation process to get it ready for the analytics process. But what if you could have in-app analytics? What if you could generate insights directly from production data?On this sponsored episode of the podcast, we talk with Stanimira Vlaeva, Developer Advocate at MongoDB, and Fredric Favelin, Technical Director, Partner Presales at MongoDB, about how a serverless database can minimize the distance between producing data and understanding it.Episode notes:Stanimira talked a lot about using BigQuery with MongoDB Atlas on Google Cloud Run. If you need to skill up on these three tools, check out this tutorial. Once you've got the hang of it, get your data connected with Confluent Connetors. With Atlas, you can transform your data in JavaScript. Connect with Stanimira on LinkedIn and Twitter. Connect with Fredric on LinkedIn. Congrats to Stellar Question winner SubniC  for  Get name of current script in Python. 

Keep Optimising
Attribution: Is GA4 all you need? with Jill Quick

Keep Optimising

Play Episode Listen Later Feb 8, 2023 40:53


Jill Quick is an analytics consultant and trainer at The Coloring In Department. Originally a client-side marketer, she's been helping brands of all sizes to get to grips with their measurement strategy since 2011. We're all starting to get to grips with GA4 – but is it all we need?In this episode we discuss:Why attribution doesn't need to be so complicated Being clear on who you are presenting the data toWhy User Acquisition and Traffic Acquisition are key reports to monitorHaving the right analytics in place to ensure that the data coming through is correctGetting ready for BigQuery and the future of your raw dataCheck out Triple Whale now >> https://keepopt.com/triplewhale Find everything Keep Optimising at KeepOptimising.com This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy

Data Engineering Podcast
Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics

Data Engineering Podcast

Play Episode Listen Later Jan 30, 2023 50:43


Summary Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically building models based on the queries that are executed. In this episode Chris Merrick shares how they manage integration and automation around the modeling layer and how it improves the organizational experience of business intelligence. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Truly leveraging and benefiting from streaming data is hard - the data stack is costly, difficult to use and still has limitations. Materialize breaks down those barriers with a true cloud-native streaming database - not simply a database that connects to streaming systems. With a PostgreSQL-compatible interface, you can now work with real-time data using ANSI SQL including the ability to perform multi-way complex joins, which support stream-to-stream, stream-to-table, table-to-table, and more, all in standard SQL. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today and sign up for early access to get started. If you like what you see and want to help make it better, they're hiring (https://materialize.com/careers/) across all functions! Your host is Tobias Macey and today I'm interviewing Chris Merrick about the Omni Analytics platform and how they are adding automatic data modeling to your business intelligence Interview Introduction How did you get involved in the area of data management? Can you describe what Omni Analytics is and the story behind it? What are the core goals that you are trying to achieve with building Omni? Business intelligence has gone through many evolutions. What are the unique capabilities that Omni Analytics offers over other players in the market? What are the technical and organizational anti-patterns that typically grow up around BI systems? What are the elements that contribute to BI being such a difficult product to use effectively in an organization? Can you describe how you have implemented the Omni platform? How have the design/scope/goals of the product changed since you first started working on it? What does the workflow for a team using Omni look like? What are some of the developments in the broader ecosystem that have made your work possible? What are some of the positive and negative inspirations that you have drawn from the experience that you and your team-mates have gained in previous businesses? What are the most interesting, innovative, or unexpected ways that you have seen Omni used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Omni? When is Omni the wrong choice? What do you have planned for the future of Omni? Contact Info LinkedIn (https://www.linkedin.com/in/merrickchristopher/) @cmerrick (https://twitter.com/cmerrick) on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Omni Analytics (https://www.exploreomni.com/) Stitch (https://www.stitchdata.com/) RJ Metrics (https://en.wikipedia.org/wiki/RJMetrics) Looker (https://www.looker.com/) Podcast Episode (https://www.dataengineeringpodcast.com/looker-with-daniel-mintz-episode-55/) Singer (https://www.singer.io/) dbt (https://www.getdbt.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dbt-data-analytics-episode-81/) Teradata (https://www.teradata.com/) Fivetran (https://www.fivetran.com/) Apache Arrow (https://arrow.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/voltron-data-apache-arrow-episode-346/) DuckDB (https://duckdb.org/) Podcast Episode (https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/) BigQuery (https://cloud.google.com/bigquery) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Lenny's Podcast: Product | Growth | Career
An inside look at Mixpanel's product journey | Vijay Iyengar (Head of Product)

Lenny's Podcast: Product | Growth | Career

Play Episode Listen Later Jan 26, 2023 46:51


Brought to you by Pando—Always on employee progression (https://www.pando.com/lenny), Notion—One workspace. Every team (https://www.notion.com/lennyspod), and Lemon.io—A marketplace of vetted software developers (https://lemon.io/lenny).Vijay Iyengar is Head of Product at Mixpanel, and similar to myself, came from an engineering background before transitioning to product. In today's episode, he explains how Mixpanel has evolved its growth strategy from a fast-paced, feature-focused approach to a more deliberate approach that prioritizes design and user experience. He also shares how Mixpanel irons out customer problems, including implementing internal tools that allow engineering and product teams to respond to customer feedback directly. Additionally, Vijay shares his top SaaS products, books, frameworks, and more. Tune in to gain valuable insights from a seasoned product leader.Find the transcript for this episode and all past episodes at: https://www.lennyspodcast.com/episodes/. Today's transcript will be live by 8 a.m. PT.Where to find Vijay Iyengar:• Twitter: https://twitter.com/vijayiyengar• LinkedIn: https://www.linkedin.com/in/vijay4/Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• Twitter: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/Referenced:• Mixpanel: https://mixpanel.com/• Figma: https://www.figma.com/• Notion: https://www.notion.so/• “Shape Up: Stop Running in Circles and Ship Work That Matters”: https://basecamp.com/shapeup• The RICE prioritization framework: https://www.productplan.com/glossary/rice-scoring-model/• BigQuery: https://cloud.google.com/bigquery• Census: https://www.getcensus.com/• Zoom: https://zoom.us/• FigJam: https://www.figma.com/figjam/• A Data Stack for PLG teams: https://mixpanel.com/blog/data-analytics-product-led-growth/• Product analytics in the modern data stack: https://mixpanel.com/blog/mixpanel-partners-with-census-to-bring-product-analytics-to-the-modern-data-stack/• Snowflake: https://www.snowflake.com/en/• Amazon Redshift: https://www.amazonaws.cn/en/redshift/• Event-Based Analytics: https://developer.mixpanel.com/docs/under-the-hood• The Goal: A Process of Ongoing Improvement: https://www.amazon.com/Goal-Process-Ongoing-Improvement/dp/0884271951• Cool Gray City of Love: 49 Views of San Francisco: https://www.amazon.com/Cool-Gray-City-Love-Francisco/dp/1608199606• The West Wing Weekly podcast: http://thewestwingweekly.com/• WeCrashed on AppleTV+: https://tv.apple.com/us/show/wecrashed/• Severance on AppleTV+: https://tv.apple.com/us/show/severance/• Gibson Biddle on Lenny's Podcast: https://www.lennyspodcast.com/gibson-biddle-on-his-dhm-product-strategy-framework-gem-roadmap-prioritization-framework-5-netflix-strategy-mini-case-studies-building-a-personal-board-of-directors-and-much-more/• Shishir Mehrotra on Lenny's Podcast: https://www.lennyspodcast.com/the-rituals-of-great-teams-shishir-mehrotra-coda-youtube-microsoft/In this episode, we cover:(00:00) Vijay's background(04:07) How Vijay learned to be more open-minded to new ideas (06:26) Mixpanel's journey(12:40) When to optimize for speed(13:49) The feature phase vs. the design phase(17:02) The importance of not losing focus on your core product(19:52) How Mixpanel organizes teams around buckets of problems(20:43) Mixpanel's most recent six-month time horizon planning cycle(25:08) The RICE framework for prioritization (and when to ignore the C and E)(26:31) The problem with estimations, and why Basecamp suggests using a six-week time box(30:04) How Mixpanel keeps product teams and engineers connected to customers via Slack (33:21) SaaS tools Mixpanel's teams use(34:54) The biggest product analytics mistakes(37:34) The present and future of analytics (41:05) How adopting a product mindset has helped Vijay grow his career(41:47) Lightning roundProduction and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com. Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe

The Cloud Pod
195: The Cloud Pod can't wait for Azure Ultra Fungible Storage (Premium)!

The Cloud Pod

Play Episode Listen Later Jan 20, 2023 48:49


On The Cloud Pod this week, Amazon announces massive corporate and tech lay offs and S3 Encrypts New Objects By Default, BigQuery multi-statement transactions are now generally available, and Microsoft announces acquisition of Fungible to accelerate datacenter innovation. Thank you to our sponsor, Foghorn Consulting, which provides top notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you're having trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week. General News: Amazon to lay off 18,000 corporate and tech workers. [1:11] Episode Highlights ⏰ Amazon S3 Encrypts New Objects By Default. [3:09] ⏰ Announcing the GA of BigQuery multi-statement transactions. [13:04] ⏰ Microsoft announces acquisition of Fungible to accelerate datacenter innovation. [17:14] Top Quote

amazon leader microsoft public ga cloud picking slack backup salesforce storage python initiatives devops azure s3 fungible tcp foghorn apache spark bigquery vpc health insurance portability gartner magic quadrant amazon eks accountability act hipaa apache airflow aws secrets manager cloud pod foghorn consulting
Experiencing Data with Brian O'Neill
108 - Google Cloud's Bruno Aziza on What Makes a Good Customer-Obsessed Data Product Manager

Experiencing Data with Brian O'Neill

Play Episode Listen Later Jan 10, 2023 50:43


Today I'm chatting with Bruno Aziza, Head of Data & Analytics at Google Cloud. Bruno leads a team of outbound product managers in charge of BigQuery, Dataproc, Dataflow and Looker and we dive deep on what Bruno looks for in terms of skills for these leaders. Bruno describes the three patterns of operational alignment he's observed in data product management, as well as why he feels ownership and customer obsession are two of the most important qualities a good product manager can have. Bruno and I also dive into how to effectively abstract the core problem you're solving, as well as how to determine whether a problem might be solved in a better way.    Highlights / Skip to: Bruno introduces himself and explains how he created his “CarCast” podcast (00:45) Bruno describes his role at Google, the product managers he leads, and the specific Google Cloud products in his portfolio (02:36) What Bruno feels are the most important attributes to look for in a good data product manager (03:59) Bruno details how a good product manager focuses on not only the core problem, but how the problem is currently solved and whether or not that's acceptable (07:20) What effective abstracting the problem looks like in Bruno's view and why he positions product management as a way to help users move forward in their career (12:38) Why Bruno sees extracting value from data as the number one pain point for data teams and their respective companies (17:55) Bruno gives his definition of a data product (21:42) The three patterns Bruno has observed of operational alignment when it comes to data product management (27:57) Bruno explains the best practices he's seen for cross-team goal setting and problem-framing (35:30)   Quotes from Today's Episode   “What's happening in the industry is really interesting. For people that are running data teams today and listening to us, the makeup of their teams is starting to look more like what we do [in] product management.” — Bruno Aziza (04:29) “The problem is the problem, so focus on the problem, decompose the problem, look at the frictions that are acceptable, look at the frictions that are not acceptable, and look at how by assembling a solution, you can make it most seamless for the individual to go out and get the job done.” – Bruno Aziza (11:28)   “As a product manager, yes, we're in the business of software, but in fact, I think you're in the career management business. Your job is to make sure that whatever your customer's job is that you're making it so much easier that they, in fact, get so much more done, and by doing so they will get promoted, get the next job.” – Bruno Aziza (15:41)   “I think that is the task of any technology company, of any product manager that's helping these technology companies: don't be building a product that's looking for a problem. Just start with the problem back and solution from that. Just make sure you understand the problem very well.” (19:52)   “If you're a data product manager today, you look at your data estate and you ask yourself, ‘What am I building to save money? When am I building to make money?' If you can do both, that's absolutely awesome. And so, the data product is an asset that has been built repeatedly by a team and generates value out of data.” – Bruno Aziza (23:12)   “[Machine learning is] hard because multiple teams have to work together, right? You got your business analyst over here, you've got your data scientists over there, they're not even the same team. And so, sometimes you're struggling with just the human aspect of it.” (30:30)   “As a data leader, an IT leader, you got to think about those soft ways to accomplish the stuff that's binary, that's the hard [stuff], right? I always joke, the hard stuff is the soft stuff for people like us because we think about data, we think about logic, we think, ‘Okay if it makes sense, it will be implemented.' For most of us, getting stuff done is through people. And people are emotional, how can you express the feeling of achieving that goal in emotional value?” – Bruno Aziza (37:36)   Links As referenced by Bruno, “Good Product Manager/Bad Product Manager”: https://a16z.com/2012/06/15/good-product-managerbad-product-manager/ LinkedIn: https://www.linkedin.com/in/brunoaziza/ Bruno's Medium Article on Competing Against Luck by Clayton M. Christensen: https://brunoaziza.medium.com/competing-against-luck-3daeee1c45d4 The Data CarCast on YouTube:  https://www.youtube.com/playlist?list=PLRXGFo1urN648lrm8NOKXfrCHzvIHeYyw

All About Affordable NFTs
Anatomy of a HODLR - Using Dune to Decide | Interview: Block199.io

All About Affordable NFTs

Play Episode Listen Later Dec 26, 2022 37:09


Block199.io Interview For our very first guest podcast, Rantum spoke with ElBarto_crypto  a Dune wizard and crypto data scientist, working on research at Block119.io. Recently, he's researched and published on topics like NFT Hodler segmentation, airdrops, and NFT wash trading.  How to check if a collection is made of hodlers or flippers Check wash trading volume for collections & platforms What tools el_barto uses to use to analyze collections What topics he plans to research next   Rough Transcript   [00:00:00] Rantum: All right. Uh, here we are. I, this is ran. Andrew, here I am with our very first guest. We've talked about this on the, if you've listened for a little while, we've talked about this on the intro for a long time, that we would have guests, we've, uh, yet to actually do that and. Well, that's up until today. Uh, we've had, uh, we have a data sign, another data scientist joining us today. [00:01:06] He actually reached out his Alberto Crypto as he goes by on Twitter. On Twitter. He reached out, uh, and asked to, to be a guest on the podcast. And, you know, I, I felt like we really needed to, to actually do it. So here we are. Welcome to the show, El Baro Crypto . [00:01:24] Elbarto Crypto: Gne. Well, thanks for having me. I've realized in life if you just asked people enough, they either get sick of you or they let you do something. [00:01:30] So in this case, I'm glad to be the latter. [00:01:33] Rantum: Yeah, yeah. Really excited to actually do this. You know, it's a good needy, you know, little instigator to, to make it happen. . Perfect. I love it. So tell me a little bit about yourself. How'd you get where you are? How'd you get into crypto? [00:01:48] Elbarto Crypto: Yeah, so I am actually a data scientist by trade. [00:01:51] I worked in marketing analytics for probably a little longer than a decade, doing everything from segmentation, machine learning just writing basic SQL queries. Got to spend a lot of time doing customer analytics for very large brands, uh, in the marketing space and. Sort of caught the Bitcoin bug in like end of 2014 ish and you couldn't do really anything at that point. [00:02:17] You could just send Bitcoin to each other and that was, that was fun. . And then yeah, Ethereum ICO came. Um, once again, people still just, you know, sending Ether around, maybe trading on XX Ether Delta, these crazy, um, underground exchanges. Uh, but once for, for a while, you know, really the only thing you could do is, is not get caught up in a scam or an ICO scam. [00:02:39] And then, uh, right around 2019, I believe it was, um, dune Analytics launched and that was like sort of. . That was sort of like the great equalizer for data scientists because now all of a sudden you have this platform that you could tap into to query the blockchain, you know, for free, honestly. And that was like the, the most beautiful thing. [00:02:58] You could just easily share careers with people, share dashboards with people. Then this Analytics community, sort of, joined around that. And then other products like SEN launched, um, flip side crypto, a lot of these, you know, data open blockchain on, on BigQuery. And so now all of a sudden you, you know, you didn't have to worry about data engineering running your own validator, running your own node. [00:03:18] You could just. Query data and you build, you know, beautiful data sets. And, uh, I think a lot of things like in the marketing analytics background, definitely applied to crypto just in terms of, you know, user retention who's actually using products, machine learning, things like that. So yeah, absolutely. [00:03:34] It's been a very natural transition to, uh, to analyzing data. And now, uh, now. It's gotten crazy. analyze a lot of data on Yeah. Yeah. , it's, it's a lot of fun though. Meet a lot of the people like yourself. So, yeah, [00:03:46] Rantum: similar background. I was, I did a lot in, in e-commerce, analytics, marketing, uh, for this and, you know, fem way into crypto. [00:03:53] And, and it's, it's, it's really exciting having access to all of the, the data as opposed to, you know, just what the, uh, the one company that you're working with, uh, provides. Right. [00:04:03] Elbarto Crypto: Yeah. I mean, that's such a good. Like I, I think people don't realize like how special this data set is if you go work for you know, like a big enterprise company, even as a data scientist, like, well one, like you probably don't get access to data right away. [00:04:17] You probably have to get approved. You have to, you know, build the trust of people to start analyzing data. Um, here from day one, you know, it's, it's open right here. You can even, I think Google BigQuery even gives you $300 free. Uh, so you can just tap in immediately and. Securing data. It's really, uh, it's really amazing. [00:04:35] Rantum: Yeah. Yeah. How did you. , what was, what did you get started with when you started on Dune? I mean, you saw that it was available, there's free data. What, what were the first projects that you got interested [00:04:46] Elbarto Crypto: in? Yeah, so the first thing was just really looking at like what, who's using, uh, who's using the platform? [00:04:52] And as many, many people know, my dog who's in the background, he's, uh, he's, he's running from farm to farm is what he's doing. Uh, many people, uh, I think at first it was just creating, you know, The raw blockchain data itself to really like, understand what's going on. I think, I think one of the biggest problems people face is there is this open data set, but immediately they, um, get inundated with these, uh, very specific contract calls and they have to, um, decode data and all of a sudden you're dealing with OS and hashes. [00:05:25] And I think at first people get really like overwhelmed. And I remember looking. , um, dune for the first time and just seeing all these contract abstractions, like, I think it, it must have started with Ave back then, but just thinking like, what is going on here? Like how this is, this seems impossible. So I think it definitely takes a lot of patience, but at first it was just, you know, how many people are using the Ethereum blockchain? [00:05:47] You know how many people a day are using it? How many people used it last month and now are using it this month again? So it was really, for me, it was really just basic before, um, really diving into like more more of the abstractions around like ave and then compound and really seeing that lending side and then really just trying to visualize like who's making big D trades, you know, just very basic, like, hey, just let's just sort dex trades by u SD volume and like, let's just see who's buying things. [00:06:14] So I think like, yeah, at first it was like super basic because honestly it, it's very overwhelming when you first look at it. Yeah, absolutely. [00:06:20] Rantum: I mean, having all of the data available is, is great. And it's also overwhelming, right? Having all the choices. Exactly. [00:06:29] Elbarto Crypto: Everything's thrown at you at once. It's uh, yeah, it's a lot of our. [00:06:34] Rantum: How did you, you know, or 2019 it was, you know, a lot of D trades, you know, then we got into, you know, the kind of defi after that and then, you know, saw, you know, NFTs come, you know, big in, you know, in 2021. I, you know, it's is when it was, you know, really took off. How did you, uh, What, what did, how did you start working with the N F T data? [00:06:56] Um, and how'd you find that different than working with some of the, uh, the early token data? Yeah, [00:07:01] Elbarto Crypto: I like, the reason I like N F T data a lot is because, um, the same token, i e d and the same, um, the transfers across trade. So I think one of the biggest problems you have with Dex trades and analyzing like defi data, , um, people can immediately go anonymous. [00:07:19] Like, if I make huge Dex trade I can then just go back into Binance or just send my phones back on a Coinbase or and then it just becomes like an, an empty hole. Like, well, what happens with that person? What happens with that token? But for n n T data, since it's all on chain and you can't really, and each token is, is essentially a, a unique token, it really lends to some interesting analytics around. [00:07:42] how long people hold NFTs. So I think that's what one of the things that drew me to it was, it's understanding the nuances are obviously like very complicated, like with lost shells, which we'll probably talk about later. But from the very beginning, it made sense to me that like ID one is always ID one and it always corresponds to the ID one. [00:08:03] So it, it really lends to some beautiful, like long-term a. , um, with NFT [00:08:08] Rantum: drill. Yeah. That, that non, that non fungible aspect really takes exactly [00:08:12] Elbarto Crypto: way, right, . Yeah, exactly. I think people take, I, I think, I think people don't realize like how unique that is. Uh, and definitely allows for, for interesting transac for interesting analysis. [00:08:22] Rantum: Yeah. I mean, I, I've definitely found that interesting. When you look at NFT collections, I mean, you can't look at trades as, you can't look at every trade as the same. I mean, there are. oftentimes reasons that things are trading at, at much, maybe something significantly higher than the floor. You know, sometimes there aren't, and you know, we know there's a lot of wash trading out there. [00:08:42] Um, have you found that, have you found that aspect difficult to decipher, you know, what is real versus, uh, sort of the wash trade or how much do you [00:08:52] Elbarto Crypto: Yeah. Uh, shout out to Hilda today for, for introducing a new, uh, uh, wash trading flag. So at first, like I remember this is like really dating myself. So like, well let's, you know, starting with like the looks rare, um, that was like when, when people first like, oh man, yeah, that, um, wash trade concept of like, hey, volume is exploding, but in reality like people are just nibis and terraforms. [00:09:19] Yeah, exactly. So I think it's like, yeah, I think it. It's so interesting because you, you, you can do very basic, you can do very good analytics, just the basic, you know, SQL group buy statements, and this is, this is Data Pod Science podcast. We'll just go right into and nerding it out. But you can do very like, easy analytics with, you know, group buy day, you know, what's the volume of n ft sales? [00:09:41] And I think a lot of a lot of mainstream publications like to just look at like, okay, um, volume is down 80. . But in reality, like it's obviously much more different than that. So, we now know that, uh, a lot of volume all looks rare was really just people farming will look for a token just mindlessly trading tokens. [00:09:57] And so from there it, you know, it'll, it, it forced people to sort of think a little deeper in terms of, you know, a trade is not just a trade and you know, what is an actual wash sale trade or and now I think. . It's interesting because blur almost, it, it forces you to rethink about analytics once again, because we know like what a, a wash trade is now. [00:10:18] But now there's like these very interesting, um, analytics going on about people, you know. Taking this one step further and doing all sorts of crazy placing orders like slightly above the floor price when they not be a wash sale or maybe swapping out NFTs. So like let's say if you own a board ape and you just want to farm, you don't really care what ape you own and you just kind of wanna farm the the blur sale, you can sort of just. [00:10:42] Sell eight, buy another one, sell that one, buy another one. You're not really changing anything. Even taking slate laws [00:10:48] Rantum: to Exactly, yeah. Of the blurred to eventual, blurred [00:10:51] Elbarto Crypto: token . Exactly. In the hopes of getting a $50,000 airdrop. Uh, so it's interesting now, like, as these N F T models evolve for, for revenue for these exchanges, like then doing the due diligence of analytics to, to figuring out, you know, what's. [00:11:09] Rantum: Yeah, it, it's, you know, we see these incentives and, I mean, we've seen all these, uh, sort of vampire attacks on open sea and different ways to try to maybe disincentivize people from gaming, the systems, and it seems like. Collectors or, you know, maybe not, and maybe those, I shouldn't really call those people the collectors. [00:11:28] These are the, the people out there. There's somebody that's going to figure out how to, to make the most of the system. And, you know, it's, it's a tough one when you to, to, to look ahead and realize what people are going to [00:11:38] Elbarto Crypto: do, . Oh, absolutely. Yeah. And I think it's, it, it's, I don't wanna say it's fa as an analytics person, but like it definitely makes you think, and it definitely challenges you to say, , how can I use data to, to understand what, what's really going [00:11:53] Rantum: on? [00:11:55] Yeah. I mean, it, it is, I mean, all of these different crypto charts, there's always, you know, looks all steady until there's a serious inflection point and something drastic changes. And, and you see this all the time. I mean, you know, and in just the volume that you see among opens sea and blur and, and looks rare and, you know, you, you sort of need to know the story behind why these, these things are changing so drastically. [00:12:17] Elbarto Crypto: Yeah, exactly. And I think that makes you a better analytics person too. Like I, you know, if, if you're looking at break into this sector, and you know, like on job interviews and things like that. Like definitely bringing up points of like, well, why do you think um, hey, this data looks like the way it is. Is it sustainable? [00:12:33] Exactly to your point. Like, is it sustainable? Like, what's really going on here? I think, do you really have a ability to differentiate, like in this space, if you can bring these, you know, differentiated analytics. So can you tell me [00:12:46] Rantum: a little bit about block 1 99 and then the N F T Hudler segmentation that you've been working? [00:12:53] Elbarto Crypto: Yeah. So, um, we are essentially just a research firm. Uh, we do research for, you know, individuals for protocols, uh, all across the stack. And so a while back someone was essentially wanted to understand, you know, what, you know, how people, how different N F T projects are and what their users look like. [00:13:15] So for example, like if you have an N F T project where. , most of the holders are people that you know, are these larger whales? Maybe just like trading in and out of things, you know, what does the long-term engagement really look like versus an N F T project where a lot of people. we're in it from the beginning. [00:13:37] And, um, they're true loyalists like of a project. And so what they were trying to understand was, you know, should I, you know, invest in this N F T project because are are the people that are really long term holders. So the way to do that, and, and it's really a, a beauty of the blockchain is you, you can, uh, you can essentially just take all the holders of a certain n ft collection and then you. [00:14:02] uh, count how many NFTs they own. Some the amount of volume that's been spent look at, you know, are they how much wash sales have they done? How many floor sweeps have they done? So you essentially have this data frame of, you know, the 10, you know, the 5,000 holders and, you know, what's all their N F T activity. [00:14:19] From there, you can just like run a kme segmentation to find these different, um, groups of, of people and then figure out like, okay, well what is their activity in the entire N F T space? So if you see that, you know, 90% of the holders, you know, on average holding N F T collection for 20 days, or are the, you know, top 10% of traders on X two, Y two, you may think twice about buying that FT collection because, you know, people are just flipping it for the sake [00:14:47] Rantum: of Flipp. [00:14:48] Right. Right. And so when you're looking at these, the activity, you're looking at activity from all [00:14:55] Elbarto Crypto: NFTs, [00:14:56] Rantum: that, that may have ha uh, been passed through this wallet. You're not looking at necessarily that specific collection. Right, [00:15:01] Elbarto Crypto: exactly. Yeah. And I think like, you, you know, you can, you can almost sort of like then do this matrix if you will. [00:15:08] Like, you could segment, um, just like current activity and then sort of like create this matrix of like Yeah. All other n ft activity to get really, like, dive into. How the users play out. But ideally, like what you're looking for and in the case of this person that they were looking for was, they wanted to see that people who are like true contributors to the N F T community or they had like strong holdings and like what would held on, would hold on a lot. [00:15:32] think like some good examples of that are like, if you look at pudgy penguins, um, shout out to the, to the poo group. Um, a lot of them have, if you, you just look at simple like distributions of how long they've been the project for. Like a lot of them, um, have held since the beginning almost. And so you kind of look at that community and say like, okay, is that's something that I wanna do for the long term. [00:15:54] and the price. And, you know, the price has done pretty well. They had on a recent mini surge this, um, past couple of weeks. But it makes sense because, you know, just the supply of, of poos to hit the market is if no one's selling, it's always going down the, the number, the amount of supply that's hitting the market. [00:16:11] So, um, it makes sense that, you know, eventually, uh, Prices price go up. So Right. Versus, uh, versus like, you see [00:16:20] Rantum: the sustained interest, I guess is, is something that it's a little hard to measure from inactivity, but if they're not, if they're not dumping, there is something there. Right. . Yeah, exactly. [00:16:30] Elbarto Crypto: And I think that's the, that's like the holy grail of NFT engagement is how do you measure engagement of, of the nons sellers. [00:16:37] Because right now I, you know, Like volume does not equal engagement. If someone is selling their N F T, they're the least engaged with the brand. [00:16:45] Rantum: Right, right. Volume is is a very poor measure of, of the quality of a [00:16:50] Elbarto Crypto: project. Exactly. I, the problem is like unfortunately, like we don't have another way to, to do this. [00:16:55] And I think like the ways so far, it's interesting looking at the, the board ape ecosystem and, and, and maybe you can see. , you know, how many people are participating in governance or how many, you know, a polls also hold ape coin. But, you know, getting outside of the blockchain, you know, engagement becomes a much more interesting thing. [00:17:18] I think the evolution of NFTs should be some sort of like, loyalty, you know, coming back to the marketing world, you know, some sort of like marketing, uh, you know, CRM engagement platform where you can engage like outside of the blockchain. . Uh, so it's, it's, it's definitely in its ancy. And I think I'll, I think, I think NFT brands that are, are data first and are looking to expand their analytics capability are definitely gonna do better. [00:17:43] Rantum: Yeah. It feels like there's just a lot not being used. I mean, you can obviously find out so much about your, your holders and what their interests are just by looking at the wallet, and I don't think that's being, being used by many, uh, project creators or uh, leaders at this point. [00:17:59] Elbarto Crypto: Oh yeah, absolutely agree. [00:18:01] Rantum: Um, so you've developed something called an nf or you're working on something called an N F D D Gen Score. Can you just tell me a little bit about that? [00:18:08] Elbarto Crypto: Yeah, so I'm looking for someone to take this over actually. So DM me because I'm going out on crypto paternity leave soon. But essentially, uh, there's something called the defi, um, D Degen score, or it's just, I think it's just called DEGEN score. [00:18:23] um, your defi activity, right? And I, I want to build, I'm in the process of building, haven't built yet. Sort of like this N F T engagement or D gen score of, okay, how many you know, how many projects do you hold? How many mints have you done? How many, how many trades have you placed? Um, the idea here is more along the lines of it's almost like a segmentation, but. [00:18:46] if you're an N F T project, that's going to whitelist some you know, whitelist or project for certain holders or addresses. It would be nice if you had an idea of, of, you know, who would you want to whitelist this for? And like, do you wanna exclude certain groups? Do you wanna include only people with above a certain score or, you know, have had certain engagement? [00:19:06] So I think it's like really meant for, you know, hopefully for projects to better understand. , Hey, let's, you know, make this let's make this white list. Like for people that really care about NFTs or like people that aren't just gonna like immediately dump, um, these NFTs. So there's a lot of ways you can do this. [00:19:24] I, I would want to almost optimize this for like, engagement with NFTs instead of just like farming to dump. But I think that's where the hard part comes in. [00:19:36] Rantum: Yeah. Yeah. I mean, I think we're seeing, I mean, we're so sort of a misalignment, so many places of, you know, where are the, you know, what is activity and, you know, what, where is the, you know, where are the royalties coming from? [00:19:46] When you're thinking about all these, uh, these different issues, it seems like we come back to the same thing. The people that are are selling are the ones that are co, I mean, you know, obviously a, a cell requires a buy, but that's when the, the income comes into a project. When. Ideally you just want nobody to, to really wanna sell. [00:20:05] Right? And [00:20:05] Elbarto Crypto: yes. Yes, [00:20:07] Rantum: exactly. How do you, I don't know. I'm not, you know, I see that one, I know that one of the projects that you wanna work on is, uh, trying to calculate the royalty income and, you know, do you see that as being a sustainable model going forward? [00:20:21] Elbarto Crypto: I, you know, I'm very mixed on this because I used to think, like, I used to want projects. [00:20:26] I used to really like the royalty initiative because I think it is fair in some sense. . I think if we were to eliminate royalties like tomorrow, I think they would force N F T brands to start thinking of like alternative business models because yeah, like as we said over and over again, like the royalty optimizes towards people selling. [00:20:47] In reality, it should be the complete flip of that. Like the most engaged people, like across every brand, the most engaged people with the brand are the ones like creating the revenue for you. I, while I, I don't know exactly where I fall in the royalty space right now. I, I, you know, it's not, luckily it's not my job to do that. [00:21:07] Yeah, right. . But I think that like, it would force n f T brands to think of new revenue streams that I only think will help them. Because right now royalties are obviously limited to people that are buying and selling NFTs. And the goal is to. a brand. Just a brand in general, right? Like when you think about Ferrari, like everyone knows Ferrari. [00:21:30] Their, their brand recognition is beyond anything. They, I, I was surprised though. I was looking at their, their income statement and they, the majority of their money, they still make off of selling cars, selling parts, et cetera. I think only like 15% of their revenue comes from like merchandising and things like that. [00:21:46] For an N F T brand, that should be the opposite. It, in my opinion, like 15% should come from royalties and then 85% should come from some other I don't know what that is. Hopefully someone smarter than they can figure that out, but yeah. Yeah, it's [00:21:58] Rantum: tough to say what that is because they're, you know, as collectors. [00:22:02] I, I think people expect to get something from that. Mm-hmm. . And as we've seen recently with the, uh, artifact uh, Nike, uh, project, you know, people are not terribly happy when the, the, the next part was the ability to buy something and, you know, maybe a discount. So, you know, I think that still has to be worked out and I'm, I'm curious where that does go. [00:22:23] Yeah, I'm not , I'm not exactly sure where I fall overall, you know, I'm very pro royalty. Artist, but I think that's a very different thing than, than these 10,000 piece collections. You know, when I think of art, you know, I'm thinking of small collections or, or even one of one. I see that as being a sustainable model going forward. [00:22:41] Very different than, you know, 5,000 or 10,000 because you've always got other people that are, I mean, it becomes a bigger issue for the liquidity of a project when you've got a five or 10%, um, royalty, uh, fee built in. [00:22:54] Elbarto Crypto: Oh yeah, absolutely. Like the art box, when you have only have 200 of them, you would hope, yeah, you would hope that some sort of wallet royalty, right? [00:23:01] Yeah. But um, it's also, yeah, like when people expect something, it's like they, they don't understand like the, the basic like lifetime value, the customer acquisition cost ratio, which is, which is completely flipped now because if I buy a doodle right now, uh, doodle will get, you know, whatever the 3% of. [00:23:24] $6,000 sale. So let's call it $180. Like great. Uh, yeah. That, that's tough for them. . Yeah. Yeah. To give me more, uh, to give me, I, yeah. I think people definitely expect $180 more of stuff. And I know, I hear, I hear doodle putt was really fun, but. . [00:23:44] Rantum: Yeah. I didn't make it over there. I was, I was down in Miami and I did not get to go to that. [00:23:48] Elbarto Crypto: Unfortunately. , I'm sure, uh, I'm sure many of the doodle holders though are, are, uh, negative. Uh, yeah. So that's like one of the issues, right? Like if you become a doodle holder, you expect the world, but like the, the economics just don't work yet. Hopefully they do in the future, but Yeah. Or like, I mean, I think this, like, I think this introduces like new. [00:24:10] New forms of business. Like if, if doodles were to, you know, create these events, but like you could lend your N F t to somebody to go, or you know, that they would pay a fee and like lend a doodle or something like that. I think like there could be interesting innovation there. It's just. [00:24:28] Rantum: Yeah, it is, it does feel like it's still to be determined how, how to really structure these for, for a longer term. [00:24:35] You know, it, it reminds me a bit of. Just as you saw advertising in, you know, online advertising, it got more and more expensive and, you know, there's more promotions and, and things to get people to come and, and click. And, you know, you're talking about the, the lifetime acquisition. You know, it went from early on when you were advertising on, on Google, you know, you could make. [00:24:54] You know, you could even maybe make a profit on, on a sale or something. And then it turned into a longer and longer lifetime and you just saw that the cost go up as it sort of got inflated. And, you know, I think Barta that was maybe, you know, there was obviously more money being spent and you saw that there was a lot of, a lot of venture capital coming in. [00:25:12] Um, I'm wondering like how that's going to start affecting. , you know, these rewards and everything. You know, if it's going to be, you know, in, I mean, I assume that it's going to get redistributed, they're like less middle men. You know, you can, it's being given directly and, you know, it's hard to recognize what is, what's, what's real versus, you know, what's sort of artificially pumped in [00:25:32] Elbarto Crypto: Yeah, absolutely. I, yeah, and it'll probably take another year to play out to see what these, yeah. What this looks like. But, [00:25:38] Rantum: um, what else are you, uh, excited about? Right. . [00:25:40] Elbarto Crypto: I think definitely like the more of the in-person experiences, um, within the NFT space. I think like I, I, I am glad everyone is mad at like Live Nation and Ticketmaster right now because I, I hope that, uh, I hope that somehow the N F T space, uh, can solve for that somehow. [00:25:59] Uh, like I'm a big fan of India Top Shot. I, the product is beautiful. It's just a, it's a great experience, but you know, like as a, as a, as a top shot, like minnow, not quite a plankton, I'll, I'll call myself a minnow. Uh, you know, I'm definitely looking for that, like further engagement of like, with, you know, with the team that I like or with the player that I like. [00:26:22] Like how, how does that like work and, and, and sort of like what's the next step for them. Certainly looking for, you know, forward to that. And then I think like on like the zuki uh, collection, you know, I. , I kind of wanna see. They have great, like website. They have a great website right now. I think they're, you know, they're branding. [00:26:40] It's really on spot. I'm, I'm really a [00:26:42] Rantum: lot with, uh, with, with wearables as well, right? Yeah, [00:26:45] Elbarto Crypto: exactly. Like I saw, I, I, I was just like walking around, uh, where I lived the other day and someone was wearing, wearing a, um, a Bathing Ape shirt and I'm like, I'm like, oh man, I, I'm kind of like, I'm hoping like bored ape supplements that, or something like that. [00:26:58] So I'm definitely looking for these like real world experiences, like the bridge between the two and then, yeah, like in like the new innovations of, of business models for these, uh, and like when, you know, when everything will just airdrop, you know, that's what I'm looking for. . [00:27:13] Rantum: Yeah. Right. Air when Airdrop [00:27:14] Yeah. When [00:27:15] Elbarto Crypto: airdrop, I'm just gonna mask token. Yeah, yeah. Exactly. Just one day, just give it all to me, you know? Right. Yeah. [00:27:21] Rantum: That would be, those were nice back, back when those were just coming out every couple weeks. Right. it, it was nice. Yeah, it was [00:27:28] Elbarto Crypto: nice. Uh, have [00:27:30] Rantum: you been to, uh, many in-person events or have you gotten to get to. [00:27:33] Elbarto Crypto: I, you know, it's funny, before the pandemic I used to go to a lot and I tell this funny story of, um, I went to a, this is like really gonna show my age. I went to a Dere meetup which is, for those who don't know, is, uh, is a, a Bitcoin esque product. And, um, , they, the where I went, they ordered, um, they ordered 10 pizzas, but only eight people went to the event. [00:27:57] So, uh, that just kind of shows you like how, uh, popular crypto was in [00:28:03] Rantum: 2018. [00:28:05] Elbarto Crypto: I wanna say. This was, um, I bet they didn't pay for the pizza with Bitcoin anyway. No, they, no, they did not. Uh, they used good old Uncle Sam dollars. So it's. It's really interesting to see like how how far things have come. And, and I was, uh, I went to an event, like a coup, a pretty small event, um, about a couple months ago. [00:28:25] And, um, it's good to see the energy back with people. Um, I like to see that. I, I, uh, I think I'm gonna make a huge, you know, I'm, I think I'm gonna make a splash next year by trying to go to some of these places, but, um, it seems like, uh, it seems like, yeah, there's good energy there. . Uh, yeah, [00:28:42] Rantum: I've enjoyed getting out to some events. [00:28:44] Just was at Basl as I, as I mentioned, and, uh, hoping to get to N ft N YC again, uh, this coming year. They seem to just, uh, make, make it difficult and move the month every , every event, . Um, lovely. So you can't really, you can never predict it, but , did you say anything? [00:29:01] Elbarto Crypto: Was there any interesting activations at, at Basil? [00:29:05] Uh, [00:29:06] Rantum: so NFT now had a big. Two blocks are so. Just for their own event. But then they had different booths within there where art blocks was there and Meta mask was there. And, uh, nine dcc, uh, which I talked about, um, recently, they had a min a shirt there, a one of 1200 was a snow fro. Oh. Uh, inspired art on this shirt. [00:29:32] So they worked with him on that. So there were some, there were some definitely cool, uh, N F T events. Um, you know, it was a little quieter in the N F T areas than, than maybe I I expected, uh, compared to, compared to something like, I mean, N F T NYC is just, it's, it's pretty big, you know? I know there's. [00:29:49] There's some complaints about, uh, the event. And, and I would also say that you don't necessarily have to go to, to the official event to, uh, to find many things to do. I mean, that's, that's, that's definitely the case at Basel, you know, where it's very unofficial. It's part of ma maybe it's part of Miami Art Week. [00:30:05] I'm not sure if it's technically even, even that. Mm-hmm. Mm-hmm. Right, right. But yeah, the in-person events I think are great. You know, I think people are still kind of getting out and I think part of, you know, even the, the fashion things, it's part of kind of bringing. Off the screen, off the, you know, off the computer and making it a little bit more real [00:30:23] Right. Exactly. [00:30:25] Elbarto Crypto: Step away from farming. It'll be, you'll be okay. You don't have to get all of them. [00:30:29] Rantum: Yeah. Right. , So you have, you have some new dashboards coming soon, but you are, you're well right now. It sounds like you are, uh, paternity leave. Huh? [00:30:38] Elbarto Crypto: trying to figure out how to make, how to do a swaddle. Yeah. So, oh man. Yeah, [00:30:42] Rantum: I remember that one. . [00:30:44] Elbarto Crypto: Yeah. Any need tips? Let me know. Yeah, I mean, I'm always looking on, you know, always looking for new data. [00:30:48] Um, always trying to build, you know, Data dashboards really to help people understand, you know, how data's being surfaced. You know, I've got a lot of like, you know, research projects on the back burner. Really trying, really, like trying to go after this question of like N F T engagement and you know, what kind of, who will be around for the long haul. [00:31:07] And then really, I am very curious about this defi N F T overlap. I, I think there's very, you know, I think the two are very separate right now, and I think people just assume by building a, a lending NFT platform that everyone will just come, you know, no questions asked. But in reality, the reason people got into NFTs is not because. [00:31:33] they're also excited about uncollateralized lending. Like they don't know what any of those words mean. And frankly, I don't think , I [00:31:40] Rantum: don't think anyone. Right. Yeah, that's good. Good [00:31:42] Elbarto Crypto: point. Yeah. So I, I, but then at the other hand, like on the other hand, I, I think, you know, if you have a board ape, you know, if you did get lucky and held on for all this time and you're sitting on, you know, $70,000, , it would be nice, you know, to realize some of these profits or, you know, maybe you have a dog who's going off to college and, and you need to pay tuition. [00:32:07] Uh, you know, it would be nice to realize maybe the little, these profits, um, yeah, [00:32:11] Rantum: right. For fractional [00:32:12] Elbarto Crypto: ownership. So I think it's definitely an un, an untapped area. I, I don't know how it'll work though in terms of UI and, and execut. [00:32:21] Rantum: what, uh, what, what, what have you been active with in what, in your wallet Recently? [00:32:27] Elbarto Crypto: What has been my wallet? I, you know, I'm like a, a real, a real defi, uh, degen. Uh, recently I've been, uh, I've been, I was like really bad farms that I've been going into. It's, uh, it's pretty sad. I, I've been trying to figure out recently, you know, some of these like NFT projects that have really gotten sold off. [00:32:47] there's a lot of work going behind the scenes. So like Rumble Kongs, for example. Um, this was like a project that really, you know, had a lot of hype. Um, so I own a couple Rumble Kongs in, in full transparency. This is a project that got a lot of hype. I think Steph Curry was wearing a Rumble Kong hat at some point. [00:33:04] Okay. ? Yeah. And like they have people working on it behind the scenes. You know, there are, there are truly people working. They are alive. Um, they are, you know, programming. I'm, I'm gonna check, I'm gonna check the floor [00:33:15] Rantum: price right now. Here's mine. And the ones that are, they're still busy and haven't left. [00:33:20] I mean, we know that a lot of these are going, they're going to zero. And, um, it is. You know, finding the ones that are still busy and gonna keep building through, you know, through the bear. That's, that's, uh, it's key if you can find them . [00:33:34] Elbarto Crypto: Right, exactly. So, yeah, you know, I'm kind of looking at that area a little bit. [00:33:38] I've been trying to like, you know, look at more like these illiquid art blocks, collections. Yeah, . I just don't know right now like what the best way. I don't know what the best way to display them or engage with like other people or like really engage with the artists is yet. So I'm still trying to like, I also just don't want to get ripped off by like buying, you know, something for like four E and then being like, what have I done, like [00:34:02] Rantum: immediately? [00:34:02] Right. But I mean, there's some, there are some very pricey, very illiquid pieces in, uh, in our blocks. You know, there's definitely some, some grape buys. Uh, can be difficult to tell the difference. And, you know, that that's the beauty of NFTs, right? , yeah, exactly. [00:34:18] Elbarto Crypto: Um, you know, I, I respect what the pudgy team is doing. [00:34:20] Uh, I don't own any poos maybe, you know? Yeah. [00:34:24] Rantum: That is impressive. They're, they're the strength of the holders. The D Gen score is high, huh? Absolutely. [00:34:29] Elbarto Crypto: Yeah. And they're really, uh, they're really executing there. So, um, you know, shout out to the team there for. Putting something good together. So yeah, that's like, I've been really trying to think about like, you know, what projects are still actively being worked on and, and sort of like, can you scoop up any, any good values? [00:34:46] Uh, yeah. And like, you have to like what it is, right? I mean, like, I don't, like if I buy something I, you know, honestly, like I don't wanna sell it. I, I, uh, You know, I'll just hold on [00:34:55] forever. [00:34:56] Rantum: That's the best way, right? Just by what you, what you actually want to own . [00:34:59] Elbarto Crypto: Yeah, exactly. Yeah. That way if it goes zero, you feel a little less bad. [00:35:04] So [00:35:04] Rantum: yeah. Hey, we all have at least a few of those, right? Oh yeah. Oh yeah. Awesome. Uh, so where can people find you and, uh, [00:35:12] Elbarto Crypto: You. Yeah. You can find a very inactive crypto account starting today at El barto underscore crypto on Twitter. Uh, I'll be back in February. Don't worry. [00:35:22] Rantum: You do have a list of, uh, research for I, you know, people will wanna get some homework. [00:35:25] There's, there's some . [00:35:27] Elbarto Crypto: Yeah. If you feel like you have nothing to do over the, over the break and, uh, you wanna. do some NFT research. I have plenty of projects for people to hand out. So Is the baby here already? No. No. It, it's coming soon. Yeah. All right. Well, [00:35:42] Rantum: so very exciting. That's awesome. Um, anything else you wanna add before we sign off here? [00:35:47] Elbarto Crypto: You know, I would add if anyone has any advice on having a bo a dog stop barking in a shadow, like please reach out to me because, uh, it's been a, a constant thorn on my side. He's a great, he's a great pal and a great farmer, so welcome to you. [00:36:01] Rantum: That shadow's throwing him. Awesome. Well, thank you so much Alberto Crypto. [00:36:04] This is awesome. First interview. Uh, very excited, and we'll have to talk again soon. [00:36:10] Elbarto Crypto: Thank you, sir. Thank you, sir. All right.  

Screaming in the Cloud
Holiday Replay Edition - The Staying Power of Kubernetes with Kelsey Hightower

Screaming in the Cloud

Play Episode Listen Later Dec 15, 2022 43:04


About KelseyKelsey Hightower is the Principal Developer Advocate at Google, the co-chair of KubeCon, the world's premier Kubernetes conference, and an open source enthusiast. He's also the co-author of Kubernetes Up & Running: Dive into the Future of Infrastructure.Links: Twitter: @kelseyhightower Company site: Google.com Book: Kubernetes Up & Running: Dive into the Future of Infrastructure TranscriptAnnouncer: Hello and welcome to Screaming in the Cloud, with your host Cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I'm joined this week by Kelsey Hightower, who claims to be a principal developer advocate at Google, but based upon various keynotes I've seen him in, he basically gets on stage and plays video games like Tetris in front of large audiences. So I assume he is somehow involved with e-sports. Kelsey, welcome to the show.Kelsey: You've outed me. Most people didn't know that I am a full-time e-sports Tetris champion at home. And the technology thing is just a side gig.Corey: Exactly. It's one of those things you do just to keep the lights on, like you're waiting to get discovered, but in the meantime, you're waiting table. Same type of thing. Some people wait tables you more or less a sling Kubernetes, for lack of a better term.Kelsey: Yes.Corey: So let's dive right into this. You've been a strong proponent for a long time of Kubernetes and all of its intricacies and all the power that it unlocks and I've been pretty much the exact opposite of that, as far as saying it tends to be over complicated, that it's hype-driven and a whole bunch of other, shall we say criticisms that are sometimes bounded in reality and sometimes just because I think it'll be funny when I put them on Twitter. Where do you stand on the state of Kubernetes in 2020?Kelsey: So, I want to make sure it's clear what I do. Because when I started talking about Kubernetes, I was not working at Google. I was actually working at CoreOS where we had a competitor Kubernetes called Fleet. And Kubernetes coming out kind of put this like fork in our roadmap, like where do we go from here? What people saw me doing with Kubernetes was basically learning in public. Like I was really excited about the technology because it's attempting to solve a very complex thing. I think most people will agree building a distributed system is what cloud providers typically do, right? With VMs and hypervisors. Those are very big, complex distributed systems. And before Kubernetes came out, the closest I'd gotten to a distributed system before working at CoreOS was just reading the various white papers on the subject and hearing stories about how Google has systems like Borg tools, like Mesa was being used by some of the largest hyperscalers in the world, but I was never going to have the chance to ever touch one of those unless I would go work at one of those companies.So when Kubernetes came out and the fact that it was open source and I could read the code to understand how it was implemented, to understand how schedulers actually work and then bonus points for being able to contribute to it. Those early years, what you saw me doing was just being so excited about systems that I attended to build on my own, becoming this new thing just like Linux came up. So I kind of agree with you that a lot of people look at it as a more of a hype thing. They're looking at it regardless of their own needs, regardless of understanding how it works and what problems is trying to solve that. My stance on it, it's a really, really cool tool for the level that it operates in, and in order for it to be successful, people can't know that it's there.Corey: And I think that might be where part of my disconnect from Kubernetes comes into play. I have a background in ops, more or less, the grumpy Unix sysadmin because it's not like there's a second kind of Unix sysadmin you're ever going to encounter. Where everything in development works in theory, but in practice things pan out a little differently. I always joke that ops is the difference between theory and practice. In theory, devs can do everything and there's no ops needed. In practice, well it's been a burgeoning career for a while. The challenge with this is Kubernetes at times exposes certain levels of abstraction that, sorry certain levels of detail that generally people would not want to have to think about or deal with, while papering over other things with other layers of abstraction on top of it. That obscure, valuable troubleshooting information from a running something in an operational context. It absolutely is a fascinating piece of technology, but it feels today like it is overly complicated for the use a lot of people are attempting to put it to. Is that a fair criticism from where you sit?Kelsey: So I think the reason why it's a fair criticism is because there are people attempting to run their own Kubernetes cluster, right? So when we think about the cloud, unless you're in OpenStack land, but for the people who look at the cloud and you say, "Wow, this is much easier." There's an API for creating virtual machines and I don't see the distributed state store that's keeping all of that together. I don't see the farm of hypervisors. So we don't necessarily think about the inherent complexity into a system like that, because we just get to use it. So on one end, if you're just a user of a Kubernetes cluster, maybe using something fully managed or you have an ops team that's taking care of everything, your interface of the system becomes this Kubernetes configuration language where you say, "Give me a load balancer, give me three copies of this container running." And if we do it well, then you'd think it's a fairly easy system to deal with because you say, "kubectl, apply," and things seem to start running.Just like in the cloud where you say, "AWS create this VM, or G cloud compute instance, create." You just submit API calls and things happen. I think the fact that Kubernetes is very transparent to most people is, now you can see the complexity, right? Imagine everyone driving with the hood off the car. You'd be looking at a lot of moving things, but we have hoods on cars to hide the complexity and all we expose is the steering wheel and the pedals. That car is super complex but we don't see it. So therefore we don't attribute as complexity to the driving experience.Corey: This to some extent feels it's on the same axis as serverless, with just a different level of abstraction piled onto it. And while I am a large proponent of serverless, I think it's fantastic for a lot of Greenfield projects. The constraints inherent to the model mean that it is almost completely non-tenable for a tremendous number of existing workloads. Some developers like to call it legacy, but when I hear the term legacy I hear, "it makes actual money." So just treating it as, "Oh, it's a science experiment we can throw into a new environment, spend a bunch of time rewriting it for minimal gains," is just not going to happen as companies undergo digital transformations, if you'll pardon the term.Kelsey: Yeah, so I think you're right. So let's take Amazon's Lambda for example, it's a very opinionated high-level platform that assumes you're going to build apps a certain way. And if that's you, look, go for it. Now, one or two levels below that there is this distributed system. Kubernetes decided to play in that space because everyone that's building other platforms needs a place to start. The analogy I like to think of is like in the mobile space, iOS and Android deal with the complexities of managing multiple applications on a mobile device, security aspects, app stores, that kind of thing. And then you as a developer, you build your thing on top of those platforms and APIs and frameworks. Now, it's debatable, someone would say, "Why do we even need an open-source implementation of such a complex system? Why not just everyone moved to the cloud?" And then everyone that's not in a cloud on-premise gets left behind.But typically that's not how open source typically works, right? The reason why we have Linux, the precursor to the cloud is because someone looked at the big proprietary Unix systems and decided to re-implement them in a way that anyone could run those systems. So when you look at Kubernetes, you have to look at it from that lens. It's the ability to democratize these platform layers in a way that other people can innovate on top. That doesn't necessarily mean that everyone needs to start with Kubernetes, just like not everyone needs to start with the Linux server, but it's there for you to build the next thing on top of, if that's the route you want to go.Corey: It's been almost a year now since I made an original tweet about this, that in five years, no one will care about Kubernetes. So now I guess I have four years running on that clock and that attracted a bit of, shall we say controversy. There were people who thought that I meant that it was going to be a flash in the pan and it would dry up and blow away. But my impression of it is that in, well four years now, it will have become more or less system D for the data center, in that there's a bunch of complexity under the hood. It does a bunch of things. No-one sensible wants to spend all their time mucking around with it in most companies. But it's not something that people have to think about in an ongoing basis the way it feels like we do today.Kelsey: Yeah, I mean to me, I kind of see this as the natural evolution, right? It's new, it gets a lot of attention and kind of the assumption you make in that statement is there's something better that should be able to arise, giving that checkpoint. If this is what people think is hot, within five years surely we should see something else that can be deserving of that attention, right? Docker comes out and almost four or five years later you have Kubernetes. So it's obvious that there should be a progression here that steals some of the attention away from Kubernetes, but I think where it's so new, right? It's only five years in, Linux is like over 20 years old now at this point, and it's still top of mind for a lot of people, right? Microsoft is still porting a lot of Windows only things into Linux, so we still discuss the differences between Windows and Linux.The idea that the cloud, for the most part, is driven by Linux virtual machines, that I think the majority of workloads run on virtual machines still to this day, so it's still front and center, especially if you're a system administrator managing BDMs, right? You're dealing with tools that target Linux, you know the Cisco interface and you're thinking about how to secure it and lock it down. Kubernetes is just at the very first part of that life cycle where it's new. We're all interested in even what it is and how it works, and now we're starting to move into that next phase, which is the distro phase. Like in Linux, you had Red Hat, Slackware, Ubuntu, special purpose distros.Some will consider Android a special purpose distribution of Linux for mobile devices. And now that we're in this distro phase, that's going to go on for another 5 to 10 years where people start to align themselves around, maybe it's OpenShift, maybe it's GKE, maybe it's Fargate for EKS. These are now distributions built on top of Kubernetes that start to add a little bit more opinionation about how Kubernetes should be pushed together. And then we'll enter another phase where you'll build a platform on top of Kubernetes, but it won't be worth mentioning that Kubernetes is underneath because people will be more interested on the thing above.Corey: I think we're already seeing that now, in terms of people no longer really care that much what operating system they're running, let alone with distribution of that operating system. The things that you have to care about slip below the surface of awareness and we've seen this for a long time now. Originally to install a web server, it wound up taking a few days and an intimate knowledge of GCC compiler flags, then RPM or D package and then yum on top of that, then ensure installed, once we had configuration management that was halfway decent.Then Docker run, whatever it is. And today feels like it's with serverless technologies being what they are, it's effectively a push a file to S3 or it's equivalent somewhere else and you're done. The things that people have to be aware of and the barrier to entry continually lowers. The downside to that of course, is that things that people specialize in today and effectively make very lucrative careers out of are going to be not front and center in 5 to 10 years the way that they are today. And that's always been the way of technology. It's a treadmill to some extent.Kelsey: And on the flip side of that, look at all of the new jobs that are centered around these cloud-native technologies, right? So you know, we're just going to make up some numbers here, imagine if there were only 10,000 jobs around just Linux system administration. Now when you look at this whole Kubernetes landscape where people are saying we can actually do a better job with metrics and monitoring. Observability is now a thing culturally that people assume you should have, because you're dealing with these distributed systems. The ability to start thinking about multi-regional deployments when I think that would've been infeasible with the previous tools or you'd have to build all those tools yourself. So I think now we're starting to see a lot more opportunities, where instead of 10,000 people, maybe you need 20,000 people because now you have the tools necessary to tackle bigger projects where you didn't see that before.Corey: That's what's going to be really neat to see. But the challenge is always to people who are steeped in existing technologies. What does this mean for them? I mean I spent a lot of time early in my career fighting against cloud because I thought that it was taking away a cornerstone of my identity. I was a large scale Unix administrator, specifically focusing on email. Well, it turns out that there aren't nearly as many companies that need to have that particular skill set in house as it did 10 years ago. And what we're seeing now is this sort of forced evolution of people's skillsets or they hunker down on a particular area of technology or particular application to try and make a bet that they can ride that out until retirement. It's challenging, but at some point it seems that some folks like to stop learning, and I don't fully pretend to understand that. I'm sure I will someday where, "No, at this point technology come far enough. We're just going to stop here, and anything after this is garbage." I hope not, but I can see a world in which that happens.Kelsey: Yeah, and I also think one thing that we don't talk a lot about in the Kubernetes community, is that Kubernetes makes hyper-specialization worth doing because now you start to have a clear separation from concerns. Now the OS can be hyperfocused on security system calls and not necessarily packaging every programming language under the sun into a single distribution. So we can kind of move part of that layer out of the core OS and start to just think about the OS being a security boundary where we try to lock things down. And for some people that play at that layer, they have a lot of work ahead of them in locking down these system calls, improving the idea of containerization, whether that's something like Firecracker or some of the work that you see VMware doing, that's going to be a whole class of hyper-specialization. And the reason why they're going to be able to focus now is because we're starting to move into a world, whether that's serverless or the Kubernetes API.We're saying we should deploy applications that don't target machines. I mean just that step alone is going to allow for so much specialization at the various layers because even on the networking front, which arguably has been a specialization up until this point, can truly specialize because now the IP assignments, how networking fits together, has also abstracted a way one more step where you're not asking for interfaces or binding to a specific port or playing with port mappings. You can now let the platform do that. So I think for some of the people who may be not as interested as moving up the stack, they need to be aware that the number of people we need being hyper-specialized at Linux administration will definitely shrink. And a lot of that work will move up the stack, whether that's Kubernetes or managing a serverless deployment and all the configuration that goes with that. But if you are a Linux, like that is your bread and butter, I think there's going to be an opportunity to go super deep, but you may have to expand into things like security and not just things like configuration management.Corey: Let's call it the unfulfilled promise of Kubernetes. On paper, I love what it hints at being possible. Namely, if I build something that runs well on top of Kubernetes than we truly have a write once, run anywhere type of environment. Stop me if you've heard that one before, 50,000 times in our industry... or history. But in practice, as has happened before, it seems like it tends to fall down for one reason or another. Now, Amazon is famous because for many reasons, but the one that I like to pick on them for is, you can't say the word multi-cloud at their events. Right. That'll change people's perspective, good job. The people tend to see multi-cloud are a couple of different lenses.I've been rather anti multi-cloud from the perspective of the idea that you're setting out day one to build an application with the idea that it can be run on top of any cloud provider, or even on-premises if that's what you want to do, is generally not the way to proceed. You wind up having to make certain trade-offs along the way, you have to rebuild anything that isn't consistent between those providers, and it slows you down. Kubernetes on the other hand hints at if it works and fulfills this promise, you can suddenly abstract an awful lot beyond that and just write generic applications that can run anywhere. Where do you stand on the whole multi-cloud topic?Kelsey: So I think we have to make sure we talk about the different layers that are kind of ready for this thing. So for example, like multi-cloud networking, we just call that networking, right? What's the IP address over there? I can just hit it. So we don't make a big deal about multi-cloud networking. Now there's an area where people say, how do I configure the various cloud providers? And I think the healthy way to think about this is, in your own data centers, right, so we know a lot of people have investments on-premises. Now, if you were to take the mindset that you only need one provider, then you would try to buy everything from HP, right? You would buy HP store's devices, you buy HP racks, power. Maybe HP doesn't sell air conditioners. So you're going to have to buy an air conditioner from a vendor who specializes in making air conditioners, hopefully for a data center and not your house.So now you've entered this world where one vendor does it make every single piece that you need. Now in the data center, we don't say, "Oh, I am multi-vendor in my data center." Typically, you just buy the switches that you need, you buy the power racks that you need, you buy the ethernet cables that you need, and they have common interfaces that allow them to connect together and they typically have different configuration languages and methods for configuring those components. The cloud on the other hand also represents the same kind of opportunity. There are some people who really love DynamoDB and S3, but then they may prefer something like BigQuery to analyze the data that they're uploading into S3. Now, if this was a data center, you would just buy all three of those things and put them in the same rack and call it good.But the cloud presents this other challenge. How do you authenticate to those systems? And then there's usually this additional networking costs, egress or ingress charges that make it prohibitive to say, "I want to use two different products from two different vendors." And I think that's-Corey: ...winds up causing serious problems.Kelsey: Yes, so that data gravity, the associated cost becomes a little bit more in your face. Whereas, in a data center you kind of feel that the cost has already been paid. I already have a network switch with enough bandwidth, I have an extra port on my switch to plug this thing in and they're all standard interfaces. Why not? So I think the multi-cloud gets lost in the chew problem, which is the barrier to entry of leveraging things across two different providers because of networking and configuration practices.Corey: That's often the challenge, I think, that people get bogged down in. On an earlier episode of this show we had Mitchell Hashimoto on, and his entire theory around using Terraform to wind up configuring various bits of infrastructure, was not the idea of workload portability because that feels like the windmill we all keep tilting at and failing to hit. But instead the idea of workflow portability, where different things can wind up being interacted with in the same way. So if this one division is on one cloud provider, the others are on something else, then you at least can have some points of consistency in how you interact with those things. And in the event that you do need to move, you don't have to effectively redo all of your CICD process, all of your tooling, et cetera. And I thought that there was something compelling about that argument.Kelsey: And that's actually what Kubernetes does for a lot of people. For Kubernetes, if you think about it, when we start to talk about workflow consistency, if you want to deploy an application, queue CTL, apply, some config, you want the application to have a load balancer in front of it. Regardless of the cloud provider, because Kubernetes has an extension point we call the cloud provider. And that's where Amazon, Azure, Google Cloud, we do all the heavy lifting of mapping the high-level ingress object that specifies, "I want a load balancer, maybe a few options," to the actual implementation detail. So maybe you don't have to use four or five different tools and that's where that kind of workload portability comes from. Like if you think about Linux, right? It has a set of system calls, for the most part, even if you're using a different distro at this point, Red Hat or Amazon Linux or Google's container optimized Linux.If I build a Go binary on my laptop, I can SCP it to any of those Linux machines and it's going to probably run. So you could call that multi-cloud, but that doesn't make a lot of sense because it's just because of the way Linux works. Kubernetes does something very similar because it sits right on top of Linux, so you get the portability just from the previous example and then you get the other portability and workload, like you just stated, where I'm calling kubectl apply, and I'm using the same workflow to get resources spun up on the various cloud providers. Even if that configuration isn't one-to-one identical.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: One thing I'm curious about is you wind up walking through the world and seeing companies adopting Kubernetes in different ways. How are you finding the adoption of Kubernetes is looking like inside of big E enterprise style companies? I don't have as much insight into those environments as I probably should. That's sort of a focus area for the next year for me. But in startups, it seems that it's either someone goes in and rolls it out and suddenly it's fantastic, or they avoid it entirely and do something serverless. In large enterprises, I see a lot of Kubernetes and a lot of Kubernetes stories coming out of it, but what isn't usually told is, what's the tipping point where they say, "Yeah, let's try this." Or, "Here's the problem we're trying to solve for. Let's chase it."Kelsey: What I see is enterprises buy everything. If you're big enough and you have a big enough IT budget, most enterprises have a POC of everything that's for sale, period. There's some team in some pocket, maybe they came through via acquisition. Maybe they live in a different state. Maybe it's just a new project that came out. And what you tend to see, at least from my experiences, if I walk into a typical enterprise, they may tell me something like, "Hey, we have a POC, a Pivotal Cloud Foundry, OpenShift, and we want some of that new thing that we just saw from you guys. How do we get a POC going?" So there's always this appetite to evaluate what's for sale, right? So, that's one case. There's another case where, when you start to think about an enterprise there's a big range of skillsets. Sometimes I'll go to some companies like, "Oh, my insurance is through that company, and there's ex-Googlers that work there." They used to work on things like Borg, or something else, and they kind of know how these systems work.And they have a slightly better edge at evaluating whether Kubernetes is any good for the problem at hand. And you'll see them bring it in. Now that same company, I could drive over to the other campus, maybe it's five miles away and that team doesn't even know what Kubernetes is. And for them, they're going to be chugging along with what they're currently doing. So then the challenge becomes if Kubernetes is a great fit, how wide of a fit it isn't? How many teams at that company should be using it? So what I'm currently seeing as there are some enterprises that have found a way to make Kubernetes the place where they do a lot of new work, because that makes sense. A lot of enterprises to my surprise though, are actually stepping back and saying, "You know what? We've been stitching together our own platform for the last five years. We had the Netflix stack, we got some Spring Boot, we got Console, we got Vault, we got Docker. And now this whole thing is getting a little more fragile because we're doing all of this glue code."Kubernetes, We've been trying to build our own Kubernetes and now that we know what it is and we know what it isn't, we know that we can probably get rid of this kind of bespoke stack ourselves and just because of the ecosystem, right? If I go to HashiCorp's website, I would probably find the word Kubernetes as much as I find the word Nomad on their site because they've made things like Console and Vault become first-class offerings inside of the world of Kubernetes. So I think it's that momentum that you see across even People Oracle, Juniper, Palo Alto Networks, they're all have seem to have a Kubernetes story. And this is why you start to see the enterprise able to adopt it because it's so much in their face and it's where the ecosystem is going.Corey: It feels like a lot of the excitement and the promise and even the same problems that Kubernetes is aimed at today, could have just as easily been talked about half a decade ago in the context of OpenStack. And for better or worse, OpenStack is nowhere near where it once was. It would felt like it had such promise and such potential and when it didn't pan out, that left a lot of people feeling relatively sad, burnt out, depressed, et cetera. And I'm seeing a lot of parallels today, at least between what was said about OpenStack and what was said about Kubernetes. How do you see those two diverging?Kelsey: I will tell you the big difference that I saw, personally. Just for my personal journey outside of Google, just having that option. And I remember I was working at a company and we were like, "We're going to roll our own OpenStack. We're going to buy a free BSD box and make it a file server. We're going all open sources," like do whatever you want to do. And that was just having so many issues in terms of first-class integrations, education, people with the skills to even do that. And I was like, "You know what, let's just cut the check for VMware." We want virtualization. VMware, for the cost and when it does, it's good enough. Or we can just actually use a cloud provider. That space in many ways was a purely solved problem. Now, let's fast forward to Kubernetes, and also when you get OpenStack finished, you're just back where you started.You got a bunch of VMs and now you've got to go figure out how to build the real platform that people want to use because no one just wants a VM. If you think Kubernetes is low level, just having OpenStack, even OpenStack was perfect. You're still at square one for the most part. Maybe you can just say, "Now I'm paying a little less money for my stack in terms of software licensing costs," but from an extraction and automation and API standpoint, I don't think OpenStack moved the needle in that regard. Now in the Kubernetes world, it's solving a huge gap.Lots of people have virtual machine sprawl than they had Docker sprawl, and when you bring in this thing by Kubernetes, it says, "You know what? Let's reign all of that in. Let's build some first-class abstractions, assuming that the layer below us is a solved problem." You got to remember when Kubernetes came out, it wasn't trying to replace the hypervisor, it assumed it was there. It also assumed that the hypervisor had APIs for creating virtual machines and attaching disc and creating load balancers, so Kubernetes came out as a complementary technology, not one looking to replace. And I think that's why it was able to stick because it solved a problem at another layer where there was not a lot of competition.Corey: I think a more cynical take, at least one of the ones that I've heard articulated and I tend to agree with, was that OpenStack originally seemed super awesome because there were a lot of interesting people behind it, fascinating organizations, but then you wound up looking through the backers of the foundation behind it and the rest. And there were something like 500 companies behind it, an awful lot of them were these giant organizations that ... they were big e-corporate IT enterprise software vendors, and you take a look at that, I'm not going to name anyone because at that point, oh will we get letters.But at that point, you start seeing so many of the patterns being worked into it that it almost feels like it has to collapse under its own weight. I don't, for better or worse, get the sense that Kubernetes is succumbing to the same thing, despite the CNCF having an awful lot of those same backers behind it and as far as I can tell, significantly more money, they seem to have all the money to throw at these sorts of things. So I'm wondering how Kubernetes has managed to effectively sidestep I guess the open-source miasma that OpenStack didn't quite manage to avoid.Kelsey: Kubernetes gained its own identity before the foundation existed. Its purpose, if you think back from the Borg paper almost eight years prior, maybe even 10 years prior. It defined this problem really, really well. I think Mesos came out and also had a slightly different take on this problem. And you could just see at that time there was a real need, you had choices between Docker Swarm, Nomad. It seems like everybody was trying to fill in this gap because, across most verticals or industries, this was a true problem worth solving. What Kubernetes did was played in the exact same sandbox, but it kind of got put out with experience. It's not like, "Oh, let's just copy this thing that already exists, but let's just make it open."And in that case, you don't really have your own identity. It's you versus Amazon, in the case of OpenStack, it's you versus VMware. And that's just really a hard place to be in because you don't have an identity that stands alone. Kubernetes itself had an identity that stood alone. It comes from this experience of running a system like this. It comes from research and white papers. It comes after previous attempts at solving this problem. So we agree that this problem needs to be solved. We know what layer it needs to be solved at. We just didn't get it right yet, so Kubernetes didn't necessarily try to get it right.It tried to start with only the primitives necessary to focus on the problem at hand. Now to your point, the extension interface of Kubernetes is what keeps it small. Years ago I remember plenty of meetings where we all got in rooms and said, "This thing is done." It doesn't need to be a PaaS. It doesn't need to compete with serverless platforms. The core of Kubernetes, like Linux, is largely done. Here's the core objects, and we're going to make a very great extension interface. We're going to make one for the container run time level so that way people can swap that out if they really want to, and we're going to do one that makes other APIs as first-class as ones we have, and we don't need to try to boil the ocean in every Kubernetes release. Everyone else has the ability to deploy extensions just like Linux, and I think that's why we're avoiding some of this tension in the vendor world because you don't have to change the core to get something that feels like a native part of Kubernetes.Corey: What do you think is currently being the most misinterpreted or misunderstood aspect of Kubernetes in the ecosystem?Kelsey: I think the biggest thing that's misunderstood is what Kubernetes actually is. And the thing that made it click for me, especially when I was writing the tutorial Kubernetes The Hard Way. I had to sit down and ask myself, "Where do you start trying to learn what Kubernetes is?" So I start with the database, right? The configuration store isn't Postgres, it isn't MySQL, it's Etcd. Why? Because we're not trying to be this generic data stores platform. We just need to store configuration data. Great. Now, do we let all the components talk to Etcd? No. We have this API server and between the API server and the chosen data store, that's essentially what Kubernetes is. You can stop there. At that point, you have a valid Kubernetes cluster and it can understand a few things. Like I can say, using the Kubernetes command-line tool, create this configuration map that stores configuration data and I can read it back.Great. Now I can't do a lot of things that are interesting with that. Maybe I just use it as a configuration store, but then if I want to build a container platform, I can install the Kubernetes kubelet agent on a bunch of machines and have it talk to the API server looking for other objects you add in the scheduler, all the other components. So what that means is that Kubernetes most important component is its API because that's how the whole system is built. It's actually a very simple system when you think about just those two components in isolation. If you want a container management tool that you need a scheduler, controller, manager, cloud provider integrations, and now you have a container tool. But let's say you want a service mesh platform. Well in a service mesh you have a data plane that can be Nginx or Envoy and that's going to handle routing traffic. And you need a control plane. That's going to be something that takes in configuration and it uses that to configure all the things in a data plane.Well, guess what? Kubernetes is 90% there in terms of a control plane, with just those two components, the API server, and the data store. So now when you want to build control planes, if you start with the Kubernetes API, we call it the API machinery, you're going to be 95% there. And then what do you get? You get a distributed system that can handle kind of failures on the back end, thanks to Etcd. You're going to get our backs or you can have permission on top of your schemas, and there's a built-in framework, we call it custom resource definitions that allows you to articulate a schema and then your own control loops provide meaning to that schema. And once you do those two things, you can build any platform you want. And I think that's one thing that it takes a while for people to understand that part of Kubernetes, that the thing we talk about today, for the most part, is just the first system that we built on top of this.Corey: I think that's a very far-reaching story with implications that I'm not entirely sure I am able to wrap my head around. I hope to see it, I really do. I mean you mentioned about writing Learn Kubernetes the Hard Way and your tutorial, which I'll link to in the show notes. I mean my, of course, sarcastic response to that recently was to register the domain Kubernetes the Easy Way and just re-pointed to Amazon's ECS, which is in no way shape or form Kubernetes and basically has the effect of irritating absolutely everyone as is my typical pattern of behavior on Twitter. But I have been meaning to dive into Kubernetes on a deeper level and the stuff that you've written, not just the online tutorial, both the books have always been my first port of call when it comes to that. The hard part, of course, is there's just never enough hours in the day.Kelsey: And one thing that I think about too is like the web. We have the internet, there's webpages, there's web browsers. Web Browsers talk to web servers over HTTP. There's verbs, there's bodies, there's headers. And if you look at it, that's like a very big complex system. If I were to extract out the protocol pieces, this concept of HTTP verbs, get, put, post and delete, this idea that I can put stuff in a body and I can give it headers to give it other meaning and semantics. If I just take those pieces, I can bill restful API's.Hell, I can even bill graph QL and those are just different systems built on the same API machinery that we call the internet or the web today. But you have to really dig into the details and pull that part out and you can build all kind of other platforms and I think that's what Kubernetes is. It's going to probably take people a little while longer to see that piece, but it's hidden in there and that's that piece that's going to be, like you said, it's going to probably be the foundation for building more control planes. And when people build control planes, I think if you think about it, maybe Fargate for EKS represents another control plane for making a serverless platform that takes to Kubernetes API, even though the implementation isn't what you find on GitHub.Corey: That's the truth. Whenever you see something as broadly adopted as Kubernetes, there's always the question of, "Okay, there's an awful lot of blog posts." Getting started to it, learn it in 10 minutes, I mean at some point, I'm sure there are some people still convince Kubernetes is, in fact, a breakfast cereal based upon what some of the stuff the CNCF has gotten up to. I wouldn't necessarily bet against it socks today, breakfast cereal tomorrow. But it's hard to find a decent level of quality, finding the certain quality bar of a trusted source to get started with is important. Some people believe in the hero's journey, story of a narrative building.I always prefer to go with the morons journey because I'm the moron. I touch technologies, I have no idea what they do and figure it out and go careening into edge and corner cases constantly. And by the end of it I have something that vaguely sort of works and my understanding's improved. But I've gone down so many terrible paths just by picking a bad point to get started. So everyone I've talked to who's actually good at things has pointed to your work in this space as being something that is authoritative and largely correct and given some of these people, that's high praise.Kelsey: Awesome. I'm going to put that on my next performance review as evidence of my success and impact.Corey: Absolutely. Grouchy people say, "It's all right," you know, for the right people that counts. If people want to learn more about what you're up to and see what you have to say, where can they find you?Kelsey: I aggregate most of outward interactions on Twitter, so I'm @KelseyHightower and my DMs are open, so I'm happy to field any questions and I attempt to answer as many as I can.Corey: Excellent. Thank you so much for taking the time to speak with me today. I appreciate it.Kelsey: Awesome. I was happy to be here.Corey: Kelsey Hightower, Principal Developer Advocate at Google. I'm Corey Quinn. This is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple podcasts. If you've hated this podcast, please leave a five-star review on Apple podcasts and then leave a funny comment. Thanks.Announcer: This has been this week's episode of Screaming in the Cloud. You can also find more Core at screaminginthecloud.com or wherever fine snark is sold.Announcer: This has been a HumblePod production. Stay humble.

Screaming in the Cloud
Winning Hearts and Minds in Cloud with Brian Hall

Screaming in the Cloud

Play Episode Listen Later Dec 13, 2022 37:51


About BrianBrian leads the Google Cloud Product and Industry Marketing team. This team is focused on accelerating the growth of Google Cloud by establishing thought leadership, increasing demand and usage, enabling their sales teams and partners to tell their product stories with excellence, and helping their customers be the best advocates for them.Before joining Google, Brian spent over 25 years in product marketing or engineering in different forms. He started his career at Microsoft and had a very non-traditional path for 20 years. Brian worked in every product division except for cloud. He did marketing, product management, and engineering roles. And, early on, he was the first speech writer for Steve Ballmer and worked on Bill Gates' speeches too. His last role was building up the Microsoft Surface business from scratch as VP of the hardware businesses. After Microsoft, Brian spent a year as CEO at a hardware startup called Doppler Labs, where they made a run at transforming hearing, and then spent two years as VP at Amazon Web Services leading product marketing, developer advocacy, and a bunch more marketing teams.Brian has three kids still at home, Barty, Noli, and Alder, who are all named after trees in different ways. His wife Edie and him met right at the beginning of their first year at Yale University, where Brian studied math, econ, and philosophy and was the captain of the Swim and Dive team his senior year. Edie has a PhD in forestry and runs a sustainability and forestry consulting firm she started, that is aptly named “Three Trees Consulting”. As a family they love the outdoors, tennis, running, and adventures in Brian's 1986 Volkswagen Van, which is his first and only car, that he can't bring himself to get rid of.Links Referenced: Google Cloud: https://cloud.google.com @isforat: https://twitter.com/IsForAt LinkedIn: https://www.linkedin.com/in/brhall/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is brought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs, and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, that's V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This episode is brought to us by our friends at Google Cloud and, as a part of that, they have given me someone to, basically, harass for the next half hour. Brian Hall is the VP of Product Marketing over at Google Cloud. Brian, welcome back.Brian: Hello, Corey. It's good to be here, and technically, we've given you time to harass me by speaking with me because you never don't have the time to harass me on Twitter and other places, and you're very good at it.Corey: Well, thank you. Again, we first met back when you were doing, effectively, the same role over at AWS. And before that, you spent only 20 years or so at Microsoft. So, you've now worked at all three of the large hyperscale cloud providers. You probably have some interesting perspectives on how the industry has evolved over that time. So, at the time of this recording, it is after Google Next and before re:Invent. There was also a Microsoft event there that I didn't pay much attention to. Where are we as a culture, as an industry, when it comes to cloud?Brian: Well, I'll start with it is amazing how early days it still is. I don't want to be put on my former Amazon cap too much, and I think it'd be pushing it a little bit to say it's complete and total day one with the cloud. But there's no question that there is a ton of evolution still to come. I mean, if you look at it, you can kind of break it into three eras so far. And roll with me here, and happy to take any dissent from you.But there was kind of a first era that was very much led by Amazon. We can call it the VM era or the component era, but being able to get compute on-demand, get nearly unlimited or actually unlimited storage with S3 was just remarkable. And it happened pretty quickly that startups, new tech companies, had to—like, it would be just wild to not start with AWS and actually start ordering servers and all that kind of stuff. And so, I look at that as kind of the first phase. And it was remarkable how long Amazon had a run really as the only player there. And maybe eight years ago—six years ago—we could argue on timeframes, things shifted a little bit because the enterprises, the big companies, and the governments finally realized, “Holy crow. This thing has gotten far enough that it's not just for these startups.”Corey: Yeah. There was a real change. There was an eye-opening moment there where it isn't just, “I want to go and sell things online.” It's, “And I also want to be a bank. Can we do that with you?” And, “Huh.”Brian: My SAP—like I don't know big that darn thing is going to get. Could I put it in your cloud? And, “Oh, by the way, CapEx forecasting stinks. Can you get me out of that?” And so, it became like the traditional IT infrastructure. All of the sudden, the IT guys showed up at the party, which I know is—it sounds fun to me, but that doesn't sound like the best addition to a party for many people. And so essentially, old-school IT infrastructure finally came to the cloud and Microsoft couldn't miss that happening when it did. But it was a major boon for AWS just because of the position that they had already.Corey: And even Google as well. All three of you now are pivoting in a lot of the messaging to talk to the big E enterprises out there. And I've noticed for the last few years, and I'm not entirely alone. When I go to re:Invent, and I look at announcements they're making, sure they have for the serverless stuff and how to run websites and EC2 nonsense. And then they're talking about IOT things and other things that just seem very oriented on a persona I don't understand. Everyone's doing stuff with mainframes now for example. And it feels like, “Oh, those of us who came here for the web services like it says on the name of the company aren't really feeling like it's for us anymore.” It's the problem of trying to be for everyone and pivoting to where the money is going, but Google's done this at least as much as anyone has in recent years. Are those of us who don't have corporate IT-like problems no longer the target market for folks or what's changed?Brian: It's still the target market, so like, you take the corporate IT, they're obviously still moving to the cloud. And there's a ton of opportunity. Just take existing IT spending and see a number over $1 trillion per year, and if you take the run rates of Microsoft, Amazon, Google Cloud, it's certainly over $100 billion, but that means it's still less than ten percent of what is existing IT spending. There are many people that think that existing IT spend number is significantly higher than that. But to your point on what's changing, there's actually a third wave that's happening.So, if the first wave was you start a company. You're a tech company, of course, you start it on AWS or on the Cloud. Second wave is all the IT people, IT departments, the central organizations that run technology for all the people that are not technology people come to the cloud. This third wave is everybody has to become a technology person. If you're a business leader, like you're at a fast-food restaurant and you're responsible for the franchisee relations, before, like, you needed to get an EDI system running or something, and so you told your IT department to figure out.Now, you have to actually think about what apps do we want to provide to our customers. How do I get the right data to my franchisees so that they can make business decisions? How can I automate all that? And you know, whereas before I was a guy wearing a suit or a gal wearing a suit who didn't need to know technology, I now have to. And that's what's changing the most. And it's why the Target Addressable Market—or the TAM as business folk sometimes say—it's really hard to estimate looking forward if every business is really needing to become a technology business in many ways. And it didn't dawn on me, honestly, and you can give me all the ribbing that I probably deserve for this—but it didn't really dawn on me until I came to Google and kept hearing the transformation word, “Digital transformation, digital transformation,” and honestly, having been in software for so long, I didn't really know what digital transformation meant until I started seeing all of these folks, like every company have to become a tech company effectively.Corey: Yeah. And it turns out there aren't enough technologists to go around, so it's very challenging to wind up getting the expertise in-house. It's natural to start looking at, “Well, how do we effectively outsource this?” And well, you can absolutely have a compression algorithm for experience. It's called, “Buying products and services and hiring people who have that experience already baked in either to the product or they show up knowing how to do something because they've done this before.”Brian: That's right. The thing I think we have to—for those of us that come from the technology side, this transformation is scary for the people who all of the sudden have to get tech and be like—Corey, if you or I—actually, you're very artistic, so maybe this wouldn't do it for you—but if I were told, “Hey, Brian, for your livelihood, you now need to incorporate painting,” like…Corey: [laugh]. I can't even write legibly let alone draw or paint. That is not my skill set. [laugh].Brian: I'd be like, “Wait, what? I'm not good at painting. I've never been a painting person, like I'm not creative.” “Okay. Great. Then we're going to fire you, or we're going to bring someone in who can.” Like, that'd be scary. And so, having more services, more people that can help as every company goes through a transition like that—and it's interesting, it's why during Covid, the cloud did really well, and some people kind of said, “Well, it's because they—people didn't want to send their people into their data centers.” No. That wasn't it. It was really because it just forced the change to digital. Like the person to, maybe, batter the analogy a little bit—the person who was previously responsible for all of the physical banks, which are—a bank has, you know, that are retail locations—the branches—they have those in order to service the retail customers.Corey: Yeah.Brian: That person, all of the sudden, had to figure out, “How do I do all that service via phone, via agents, via an app, via our website.” And that person, that entire organization, was forced digital in many ways. And that certainly had a lot of impact on the cloud, too.Corey: Yeah. I think that some wit observed a few years back that Covid has had more impact on your digital transformation than your last ten CIOs combined.Brian: Yeah.Corey: And—yeah, suddenly, you're forcing people into a position where there really is no other safe option. And some of that has unwound but not a lot of it. There's still seem to be those same structures and ability to do things from remote locations then there were before 2020.Brian: Yeah. Since you asked, kind of, where we are in the industry, to bring all of that to an endpoint, now what this means is people are looking for cloud providers, not just to have the primitives, not just to have the IT that they—their central IT needed, but they need people who can help them build the things that will help their business transform. It makes it a fun, new stage, new era, a transformation era for companies like Google to be able to say, “Hey, here's how we build things. Here's what we've learned over a period of time. Here's what we've most importantly learned from other customers, and we want to help be your strategic partner in that transformation.” And like I said, it'd be almost impossible to estimate what the TAM is for that. The real question is how quickly can we help customers and innovate in our Cloud solutions in order to make more of the stuff more powerful and faster to help people build.Corey: I want to say as well that—to be clear—you folks can buy my attention but not my opinion. I will not say things if I do not believe them. That's the way the world works here. But every time I use Google Cloud for something, I am taken aback yet again by the developer experience, how polished it is. And increasingly lately, it's not just that you're offering those low-lying primitives that composed together to build things higher up the stack, you're offering those things as well across a wide variety of different tooling options. And they just tend to all make sense and solve a need rather than requiring me to build it together myself from popsicle sticks.And I can't shake the feeling that that's where the industry is going. I'm going to want someone to sell me an app to do expense reports. I'm not going to want—well, I want a database and a front-end system, and how I wind up storing all the assets on the backend. No. I just want someone to give me something that solves that problem for me. That's what customers across the board are looking for as best I can see.Brian: Well, it certainly expands the number of customers that you can serve. I'll give you an example. We have an AI agent product called Call Center AI which allows you to either build a complete new call center solution, or more often it augments an existing call center platform. And we could sell that on an API call basis or a number of agent seats basis or anything like that. But that's not actually how call center leaders want to buy. Imagine we come in and say, “This many API calls or $4 per seat or per month,” or something like that. There's a whole bunch of work for that call center leader to go figure out, “Well, do I want to do this? Do I not? How should I evaluate it versus others?” It's quite complex. Whereas, if we come in and say, “Hey, we have a deal for you. We will guarantee higher customer satisfaction. We will guarantee higher agent retention. And we will save you money. And we will only charge you some percentage of the amount of money that you're saved.”Corey: It's a compelling pitch.Brian: Which is an easier one for a business decision-maker to decide to take?Corey: It's no contest. I will say it's a little odd that—one thing—since you brought it up, one thing that struck me as a bit strange about Contact Center AI, compared to most of the services I would consider to be Google Cloud, instead of, “Click here to get started,” it's, “Click here to get a demo. Reach out to contact us.” It feels—Brian: Yeah.Corey: —very much like the deals for these things are going to get signed on a golf course.Brian: [laugh]. They—I don't know about signed on a golf course. I do know that there is implementation work that needs to be done in order to build the models because it's the model for the AI, figuring out how your particular customers are served in your particular context that takes the work. And we need to bring in a partner or bring in our expertise to help build that out. But it sounds to me like you're looking to go golfing since you've looked into this situation.Corey: Just like painting, I'm no good at golfing either.Brian: [laugh].Corey: Honestly, it's—it just doesn't have the—the appeal isn't there for me for whatever reason. I smile; I nod; I tend to assume that, “Yeah, that's okay. I'll leave some areas for other people to go exploring in.”Brian: I see. I see.Corey: So, two weeks before Google Cloud Next occurred, you folks wound up canceling Stadia, which had been rumored for a while. People had been predicting it since it was first announced because, “Just wait. They're going to Google Reader it.” And yeah, it was consumer-side, and I do understand that that was not Cloud. But it did raise the specter of—for people to start talking once again about, “Oh, well, Google doesn't have any ability to focus on things long-term. They're going to turn off Cloud soon, too. So, we shouldn't be using it at all.” I do not agree with that assessment.But I want to get your take on it because I do have some challenges with the way that your products and services go to market in some ways. But I don't have the concern that you're going to turn it all off and decide, “Yeah, that was a fun experiment. We're done.” Not with Cloud, not at this point.Brian: Yeah. So, I'd start with at Google Cloud, it is our job to be a trusted enterprise platform. And I can't speak to before I was here. I can't speak to before Thomas Kurian, who's our CEO, was here before. But I can say that we are very, very focused on that. And deprecating products in a surprising way or in a way that doesn't take into account what customers are on it, how can we help those customers is certainly not going to help us do that. And so, we don't do that anymore.Stadia you brought up, and I wasn't part of starting Stadia. I wasn't part of ending Stadia. I honestly don't know anything about Stadia that any average tech-head might not know. But it is a different part of Google. And just like Amazon has deprecated plenty of services and devices and other things in their consumer world—and Microsoft has certainly deprecated many, many, many consumer and other products—like, that's a different model. And I won't say whether it's good, bad, or righteous, or not.But I can say at Google Cloud, we're doing a really good job right now. Can we get better? Of course. Always. We can get better at communicating, engaging customers in advance. But we now have a clean deprecation policy with a set of enterprise APIs that we commit to for stated periods of time. We also—like people should take a look. We're doing ten-year deals with companies like Deutsche Bank. And it's a sign that Google is here to last and Google Cloud in particular. It's also at a market level, just worth recognizing.We are a $27 billion run rate business now. And you earn trust in drips. You lose it in buckets. And we're—we recognize that we need to just keep every single day earning trust. And it's because we've been able to do that—it's part of the reason that we've gotten as large and as successful as we have—and when you get large and successful, you also tend to invest more and make it even more clear that we're going to continue on that path. And so, I'm glad that the market is seeing that we are enterprise-ready and can be trusted much, much more. But we're going to keep earning every single day.Corey: Yeah. I think it's pretty fair to say that you have definitely gotten yourselves into a place where you've done the things that I would've done if I wanted to shore up trust that the platform was not going to go away. Because these ten-year deals are with the kinds of companies that, shall we say, do not embark on signing contracts lightly. They very clearly, have asked you the difficult, pointed questions that I'm basically asking you now as cheap shots. And they ask it in very serious ways through multiple layers of attorneys. And if the answers aren't the right answers, they don't sign the contract. That is pretty clearly how the world works.The fact that companies are willing to move things like core trading systems over to you on a ten-year time horizon, tells me that I can observe whatever I want from the outside, but they have actual existential risk questions tied to what they're doing. And they are in some ways betting their future on your folks. You clearly know what those right answers are and how to articulate them. I think that's the side of things that the world does not get to see or think about very much. Because it is easy to point at all the consumer failings and the hundreds of messaging products that you continually replenish just in order to kill.Brian: [laugh].Corey: It's—like, what is it? The tree of liberty must be watered periodically from time to time, but the blood of patriots? Yeah. The logo of Google must be watered by the blood of canceled messaging products.Brian: Oh, come on. [laugh].Corey: Yeah. I'm going to be really scared if there's an actual, like, Pub/Sub service. I don't know. That counts as messaging, sort of. I don't know.Brian: [laugh]. Well, thank you. Thank you for the recognition of how far we've come in our trust from enterprises and trust from customers.Corey: I think it's the right path. There's also reputational issues, too. Because in the absence of new data, people don't tend to change their opinion on things very easily. And okay, there was a thing I was using. It got turned off. There was a big kerfuffle. That sticks in people's minds. But I've never seen an article about a Google service saying, “Oh, yeah. It hasn't been turned off or materially changed. In fact, it's gotten better with time. And it's just there working reliably.” You're either invisible, or you're getting yelled at.It feels like it's a microcosm of my early career stage of being a systems administrator. I'm either invisible or the mail system's broke, and everyone wants my head. I don't know what the right answer is—Brian: That was about right to me.Corey: —in this thing. Yeah. I don't know what the right answer on these things is, but you're definitely getting it right. I think the enterprise API endeavors that you've gone through over the past year or two are not broadly known. And frankly, you've definitely are ex-AWS because enterprise APIs is a terrible name for what these things are.Brian: [laugh].Corey: I'll let you explain it. Go ahead. And bonus points if you can do it without sounding like a press release. Take it away.Brian: There are a set of APIs that developers and companies should be able to know are going to be supported for the period of time that they need in order to run their applications and truly bet on them. And that's what we've done.Corey: Yeah. It's effectively a commitment that there will not be meaningful deprecations or changes to the API that are breaking changes without significant notice periods.Brian: Correct.Corey: And to be clear, that is exactly what all of the cloud providers have in their enterprise contracts. They're always notice periods around those things. There are always, at least, certain amounts of time and significant breach penalties in the event that, “Yeah, today, I decided that we were just not going to spin up VMs in that same way as we always have before. Sorry. Sucks to be you.” I don't see that happening on the Google Cloud side of the world very often, not like it once did. And again, we do want to talk about reputations.There are at least four services that I'm aware of that AWS has outright deprecated. One, Sumerian has said we're sunsetting the service in public. But on the other end of the spectrum, RDS on VMWare has been completely memory-holed. There's a blog post or two but nothing else remains in any of the AWS stuff, I'm sure, because that's an, “Enterprise-y” service, they wound up having one on one conversations with customers or there would have been a hue and cry. But every cloud provider does, in the fullness of time, turn some things off as they learn from their customers.Brian: Hmm. I hadn't heard anything about AWS Infinidash for a while either.Corey: No, no. It seems to be one of those great services that we made up on the internet one day for fun. And I love that just from a product marketing perspective. I mean, you know way more about that field than I do given that it's your job, and I'm just sitting here in this cheap seats throwing peanuts at you. But I love the idea of customers just come up and make up a product one day in your space and then the storytelling that immediately happens thereafter. Most companies would kill for something like that just because you would expect on some level to learn so much about how your reputation actually works. When there's a platonic ideal of a service that isn't bothered by pesky things like, “It has to exist,” what do people say about it? And how does that work?And I'm sort of surprised there wasn't more engagement from Amazon on that. It always seems like they're scared to say anything. Which brings me to a marketing question I have for you. You and Amazing have similar challenges—you being Google in this context, not you personally—in that your customers take themselves deadly seriously. And as a result, you have to take yourselves with at least that same level of seriousness. You can't go on Twitter and be the Wendy's Twitter account when you're dealing with enterprise buyers of cloud platforms. I'm kind of amazed, and I'd love to know. How can you manage to say anything at all? Because it just seems like you are so constrained, and there's no possible thing you can say that someone won't take issue with. And yes, some of the time, that someone is me.Brian: Well, let's start with going back to Infinidash a little bit. Yes, you identified one interesting thing about that episode, if I can call it an episode. The thing that I tell you though that didn't surprise me is it shows how much of cloud is actually learned from other people, not from the cloud provider itself. I—you're going to be going to re:Invent. You were at Google Cloud Next. Best thing about the industry conferences is not what the provider does. It's the other people that are there that you learn from. The folks that have done something that you've been trying to do and couldn't figure out how to do, and then they explained it to you, just the relationships that you get that help you understand what's going on in this industry that's changing so fast and has so much going on.And so,   And so, that part didn't surprise me. And that gets a little bit to the second part of your—that we're talking about. “How do you say anything?” As long as you're helping a customer say it. As long as you're helping someone who has been a fan of a product and has done interesting things with it say it, that's how you communicate for the most part, putting a megaphone in front of the people who already understand what's going on and helping their voice be heard, which is a lot more fun, honestly, than creating TV ads and banner ads and all of the stuff that a lot of consumer and traditional companies. We get to celebrate our customers and our creators much, much more.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: I think that it's not super well understood by a lot of folks out there that the official documentation that any cloud provider puts out there is kind of a last resort. Or I'm looking for the specific flag to a specific parameter of a specific command. Great. Sure. But what I really want to do whenever I'm googling how to do something—and yes, that—we're going to be googling—welcome. You've successfully owned that space to the point where it's become common parlance. Good work is I want to see what other people had said. I want to find blog posts, ideally recent ones, talking about how to do the thing that I'm trying to do. If I'm trying to do something relatively not that hard or not that uncommon, if I spin up three web servers behind a load-balancer, and I can't find any community references on how to do that thing, either I'm trying to do something absolutely bizarre and I should re-think it, or there is no community/customer base for the product talking about how to do things with it.And I have noticed a borderline Cambrian explosion over the last few years of the Google Cloud community. I'm seeing folks who do not work at Google, and also who have never worked at Google, and sometimes still think they work at Google in some cases. It's not those folks. It is people who are just building things as a customer. And they, in turn, become very passionate advocates for the platform. And they start creating content on these things.Brian: Yeah. We've been blessed to have, not only, the customer base grow, but essentially the passion among that customer base, and we've certainly tried to help building community and catalyzing the community, but it's been fun to watch how our customers' success turns into our success which turns into customer success. And it's interesting, in particular, to see too how much of that passion comes from people seeing that there is another way to do things.It's clear that many people in our industry knew cloud through the lens of Amazon, knew tech in general through the lenses of Microsoft and Oracle and a lot of other companies. And Google, which we try and respect specifically what people are trying to accomplish and how they know how to do it, we also many ways have taken a more opinionated approach, if you will, to say, “Hey, here's how this could be done in a different way.” And when people find something that's unexpectedly different and also delightful, it's more likely that they're going to be strong advocates and share that passion with the world.Corey: It's a virtuous cycle that leads to the continued growth and success of a platform. Something I've been wondering about in the broader sense, is what happens after this? Because if, let's say for the sake of argument, that one of the major cloud providers decided, “Okay. You know, we're going to turn this stuff off. We've decided we don't really want to be in the cloud business.” It turns out that high-margin businesses that wind up turning into cash monsters as soon as you stop investing heavily in growing them, just kind of throw off so much that, “We don't know what to do with. And we're running out of spaces to store it. So, we're getting out of it.” I don't know how that would even be possible at some point. Because given the amount of time and energy some customers take to migrate in, it would be a decade-long project for them to migrate back out again.So, it feels on some level like on the scale of a human lifetime, that we will be seeing the large public cloud providers, in more or less their current form, for the rest of our lives. Is that hopelessly naïve? Am I missing—am I overestimating how little change happens in the sweep of a human lifetime in technology?Brian: Well, I've been in the tech industry for 27 years now. And I've just seen a continual moving up the stack. Where, you know, there are fundamental changes. I think the PC becoming widespread, fundamental change; mobile, certainly becoming primary computing experience—what I know you call a toilet computer, I call my mobile; that's certainly been a change. Cloud has certainly been a change. And so, there are step functions for sure. But in general, what has been happening is things just keep moving up the stack. And as things move up the stack, there are companies that evolve and learn to do that and provide more value and more value to new folks. Like I talked about how businesspeople are leaders in technology now in a way that they never were before. And you need to give them the value in a way that they can understand it, and they can consume it, and they can trust it. And it's going to continue to move in that direction.And so, what happens then as things move up the stack, the abstractions start happening. And so, there are companies that were just major players in the ‘90s, whether it's Novell or Sun Microsystems or—I was actually getting a tour of the Sunnyvale/Mountain View Google Campuses yesterday. And the tour guide said, “This used to be the site of a company that was called Silicon Graphics. They did something around, like, making things for Avatar.” I felt a little aged at that point.But my point is, there are these companies that were amazing in their time. They didn't move up the stack in a way that met the net set of needs. And it's not like that crater the industry or anything, it's just people were able to move off of it and move up. And I do think that's what we'll see happening.Corey: In some cases, it seems to slip below the waterline and become, effectively, plumbing, where everyone uses it, but no one knows who they are or what they do. The Tier 1 backbone providers these days tend to be in that bucket. Sure, some of them have other businesses, like Verizon. People know who Verizon is, but they're one of the major Tier 1 carriers in the United States just of the internet backbone.Brian: That's right. And that doesn't mean it's not still a great business.Corey: Yeah.Brian: It just means it's not front of mind for maybe the problems you're trying to solve or the opportunities we're trying to capture at that point in time.Corey: So, my last question for you goes circling back to Google Cloud Next. You folks announced an awful lot of things. And most of them, from my perspective, were actually pretty decent. What do you think is the most impactful announcement that you made that the industry largely overlooked?Brian: Most impactful that the industry—well, overlooked might be the wrong way to put this. But there's this really interesting thing happening in the cloud world right now where whereas before companies, kind of, chose their primary cloud writ large, today because multi-cloud is actually happening in the vast majority of companies have things in multiple places, people make—are making also the decision of, “What is going to be my strategic data provider?” And I don't mean data in the sense of the actual data and meta-data and the like, but my data cloud.Corey: Mm-hmm.Brian: How do I choose my data cloud specifically? And there's been this amazing profusion of new data companies that do better ETL or ELT, better data cleaning, better packaging for AI, new techniques for scaling up/scaling down at cost. A lot of really interesting stuff happening in the dataspace. But it's also created almost more silos. And so, the most important announcement that we made probably didn't seem like a really big announcement to a lot of people, but it really was about how we're connecting together more of our data cloud with BigQuery, with unstructured and structured data support, with support for data lakes, including new formats, including Iceberg and Delta and Hudi to come how—Looker is increasingly working with BigQuery in order to make it, so that if you put data into Google Cloud, you not only have these super first-class services that you can use, ranging from databases like Spanner to BigQuery to Looker to AI services, like Vertex AI, but it's also now supporting all these different formats so you can bring third-party applications into that one place. And so, at the big cloud events, it's a new service that is the biggest deal. For us, the biggest deal is how this data cloud is coming together in an open way to let you use the tool that you want to use, whether it's from Google or a third party, all by betting on Google's data cloud.Corey: I'm really impressed by how Google is rather clearly thinking about this from the perspective of the data has to be accessible by a bunch of different things, even though it may take wildly different forms. It is making the data more fluid in that it can go to where the customer needs it to be rather than expecting the customer to come to it where it lives. That, I think, is a trend that we have not seen before in this iteration of the tech industry.Brian: I think you got that—you picked that up very well. And to some degree, if you step back and look at it, it maybe shouldn't be that surprising that Google is adept at that. When you think of what Google search is, how YouTube is essentially another search engine producing videos that deliver on what you're asking for, how information is used with Google Maps, with Google Lens, how it is all about taking information and making it as universally accessible and helpful as possible. And if we can do that for the internet's information, why can't we help businesses do it for their business information? And that's a lot of where Google certainly has a unique approach with Google Cloud.Corey: I really want to thank you for being so generous with your time. If people want to learn more about what you're up to, where's the best place for them to find you?Brian: cloud.google.com for Google Cloud information of course. And if it's still running when this podcast goes, @isforat, I-S-F-O-R-A-T, on Twitter.Corey: And we will put links to both of those in the show notes. Thank you so much for you time. I appreciate it.Brian: Thank you, Corey. It's been good talking with you.Corey: Brian Hall, VP of Product Marketing at Google Cloud. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas, if you've hated this podcast, please, leave a five-star review on your podcast platform of choice along with an insulting angry comment dictating that, “No. Large companies make ten-year-long commitments casually all the time.”Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Google Cloud Reader
How Spanner and BigQuery work together to handle transactional and analytical workloads

Google Cloud Reader

Play Episode Listen Later Dec 8, 2022 11:17


Original blog post More articles at cloud.google.com/blog

Python Bytes
#311 Catching Memory Leaks with ... pytest?

Python Bytes

Play Episode Listen Later Nov 24, 2022 49:50


Watch on YouTube About the show Python Bytes 311 Sponsored by Microsoft for Startups Founders Hub. Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Special guest: Murilo Cunha Michael #1: Latexify We are used to turning beautiful math into programming symbols. For example: amitness.com/2019/08/math-for-programmers/#sigma Take this code: def do_math(a, b, c): return (-b + math.sqrt(b ** 2 - 4 * a * c)) / (2 * a) Add @latexify.function decorator display do_math in a notebook Get this latex: mathrm{do_math}(a, b, c) = frac{-b + sqrt{b^{{2}} - {4} a c}}{{2} a} Which renders as I could only get it to install with: pip install git+https://github.com/google/latexify_py Brian #2: prefixed From Avram Lubkin “Prefixed provides an alternative implementation of the built-in float which supports formatted output with SI (decimal) and IEC (binary) prefixes.” >>> from prefixed import Float >>> f'{Float(3250):.2h}' '3.25k' >>> '{:.2h}s'.format(Float(.00001534)) '15.34μs' >>> '{:.2k}B'.format(Float(42467328)) '40.50MiB' >>> f'{Float(2048):.2m}B' '2.00KB' Because prefixed.Float inherits from the built-in float, it behaves exactly the same in most cases. When a math operation is performed with another real number type (float, int), the result will be a prefixed.Float instance. also interesting First new SI prefixes for over 30 years new prefixes also show up here - Murilo #3: dbt Open source tool CLI tool Built with Python

Screaming in the Cloud
The Art and Science of Database Innovation with Andi Gutmans

Screaming in the Cloud

Play Episode Listen Later Nov 23, 2022 37:07


About AndiAndi Gutmans is the General Manager and Vice President for Databases at Google. Andi's focus is on building, managing and scaling the most innovative database services to deliver the industry's leading data platform for businesses. Before joining Google, Andi was VP Analytics at AWS running services such as Amazon Redshift. Before his tenure at AWS, Andi served as CEO and co-founder of Zend Technologies, the commercial backer of open-source PHP.Andi has over 20 years of experience as an open source contributor and leader. He co-authored open source PHP. He is an emeritus member of the Apache Software Foundation and served on the Eclipse Foundation's board of directors. He holds a bachelor's degree in Computer Science from the Technion, Israel Institute of Technology.Links Referenced: LinkedIn: https://www.linkedin.com/in/andigutmans/ Twitter: https://twitter.com/andigutmans TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us by our friends at Google Cloud, and in so doing, they have gotten a guest to appear on this show that I have been low-key trying to get here for a number of years. Andi Gutmans is VP and GM of Databases at Google Cloud. Andi, thank you for joining me.Andi: Corey, thanks so much for having me.Corey: I have to begin with the obvious. Given that one of my personal passion projects is misusing every cloud service I possibly can as a database, where do you start and where do you stop as far as saying, “Yes, that's a database,” so it rolls up to me and, “No, that's not a database, so someone else can deal with the nonsense?”Andi: I'm in charge of the operational databases, so that includes both the managed third-party databases such as MySQL, Postgres, SQL Server, and then also the cloud-first databases, such as Spanner, Big Table, Firestore, and AlloyDB. So, I suggest that's where you start because those are all awesome services. And then what doesn't fall underneath, kind of, that purview are things like BigQuery, which is an analytics, you know, data warehouse, and other analytics engines. And of course, there's always folks who bring in their favorite, maybe, lesser-known or less popular database and self-manage it on GCE, on Compute.Corey: Before you wound up at Google Cloud, you spent roughly four years at AWS as VP of Analytics, which is, again, one of those very hazy type of things. Where does it start? Where does it stop? It's not at all clear from the outside. But even before that, you were, I guess, something of a legendary figure, which I know is always a weird thing for people to hear.But you were partially at least responsible for the Zend Framework in the PHP world, which I didn't realize what the heck that was, despite supporting it in production at a couple of jobs, until after I, for better or worse, was no longer trusted to support production environments anymore. Which, honestly, if you can get out, I'm a big proponent of doing that. You sleep so much better without a pager. How did you go from programming languages all the way on over to databases? It just seems like a very odd mix.Andi: Yeah. No, that's a great question. So, I was one of the core developers of PHP, and you know, I had been in the PHP community for quite some time. I also helped ideate. The Zend Framework, which was the company that, you know, I co-founded Zend Technologies was kind of the company behind PHP.So, like Red Hat supports Linux commercially, we supported PHP. And I was very much focused on developers, programming languages, frameworks, IDEs, and that was, you know, really exciting. I had also done quite a bit of work on interoperability with databases, right, because behind every application, there's a database, and so a lot of what we focused on is a great connectivity to MySQL, to Postgres, to other databases, and I got to kind of learn the database world from the outside from the application builders. We sold our company in I think it was 2015 and so I had to kind of figure out what's next. And so, one option would have been, hey, stay in programming languages, but what I learned over the many years that I worked with application developers is that there's a huge amount of value in data.And frankly, I'm a very curious person; I always like to learn, so there was this opportunity to join Amazon, to join the non-relational database side, and take myself completely out of my comfort zone. And actually, I joined AWS to help build the graph database Amazon Neptune, which was even more out of my comfort zone than even probably a relational database. So, I kind of like to do different things and so I joined and I had to learn, you know how to build a database pretty much from the ground up. I mean, of course, I didn't do the coding, but I had to learn enough to be dangerous, and so I worked on a bunch of non-relational databases there such as, you know, Neptune, Redis, Elasticsearch, DynamoDB Accelerator. And then there was the opportunity for me to actually move over from non-relational databases to analytics, which was another way to get myself out of my comfort zone.And so, I moved to run the analytic space, which included services like Redshift, like EMR, Athena, you name it. So, that was just a great experience for me where I got to work with a lot of awesome people and learn a lot. And then the opportunity arose to join Google and actually run the Google transactional databases including their older relational databases. And by the way, my job actually have two jobs. One job is running Spanner and Big Table for Google itself—meaning, you know, search ads and YouTube and everything runs on these databases—and then the second job is actually running external-facing databases for external customers.Corey: How alike are those two? Is it effectively the exact same thing, just with different API endpoints? Are they two completely separate universes? It's always unclear from the outside when looking at large companies that effectively eat versions of their own dog food, where their internal usage of these things starts and stops.Andi: So, great question. So, Cloud Spanner and Cloud Big Table do actually use the internal Spanner and Big Table. So, at the core, it's exactly the same engine, the same runtime, same storage, and everything. However, you know, kind of, internally, the way we built the database APIs was kind of good for scrappy, you know, Google engineers, and you know, folks are kind of are okay, learning how to fit into the Google ecosystem, but when we needed to make this work for enterprise customers, we needed a cleaner APIs, we needed authentication that was an external, right, and so on, so forth. So, think about we had to add an additional set of APIs on top of it, and management, right, to really make these engines accessible to the external world.So, it's running the same engine under the hood, but it is a different set of APIs, and a big part of our focus is continuing to expose to enterprise customers all the goodness that we have on the internal system. So, it's really about taking these very, very unique differentiated databases and democratizing access to them to anyone who wants to.Corey: I'm curious to get your position on the idea that seems to be playing it's—I guess, a battle that's been playing itself out in a number of different customer conversations. And that is, I guess, the theoretical decision between, do we go towards general-purpose databases and more or less treat every problem as a nail in search of a hammer or do you decide that every workload gets its own custom database that aligns the best with that particular workload? There are trade-offs in either direction, but I'm curious where you land on that given that you tend to see a lot more of it than I do.Andi: No, that's a great question. And you know, just for the viewers who maybe aren't aware, there's kind of two extreme points of view, right? There's one point of view that says, purpose-built for everything, like, every specific pattern, like, build bespoke databases, it's kind of a best-of-breed approach. The problem with that approach is it becomes extremely complex for customers, right? Extremely complex to decide what to use, they might need to use multiple for the same application, and so that can be a bit daunting as a customer. And frankly, there's kind of a law of diminishing returns at some point.Corey: Absolutely. I don't know what the DBA role of the future is, but I don't think anyone really wants it to be, “Oh, yeah. We're deciding which one of these three dozen manage database services is the exact right fit for each and every individual workload.” I mean, at some point it feels like certain cloud providers believe that not only every workload should have its own database, but almost every workload should have its own database service. It's at some point, you're allowed to say no and stop building these completely, what feel like to me, Byzantine, esoteric database engines that don't seem to have broad applicability to a whole lot of problems.Andi: Exactly, exactly. And maybe the other extreme is what folks often talk about as multi-model where you say, like, “Hey, I'm going to have a single storage engine and then map onto that the relational model, the document model, the graph model, and so on.” I think what we tend to see is if you go too generic, you also start having performance issues, you may not be getting the right level of abilities and trade-offs around consistency, and replication, and so on. So, I would say Google, like, we're taking a very pragmatic approach where we're saying, “You know what? We're not going to solve all of customer problems with a single database, but we're also not going to have two dozen.” Right?So, we're basically saying, “Hey, let's understand that the main characteristics of the workloads that our customers need to address, build the best services around those.” You know, obviously, over time, we continue to enhance what we have to fit additional models. And then frankly, we have a really awesome partner ecosystem on Google Cloud where if someone really wants a very specialized database, you know, we also have great partners that they can use on Google Cloud and get great support and, you know, get the rest of the benefits of the platform.Corey: I'm very curious to get your take on a pattern that I've seen alluded to by basically every vendor out there except the couple of very obvious ones for whom it does not serve their particular vested interests, which is that there's a recurring narrative that customers are demanding open-source databases for their workloads. And when you hear that, at least, people who came up the way that I did, spending entirely too much time on Freenode, back when that was not a deeply problematic statement in and of itself, where, yes, we're open-source, I guess, zealots is probably the best terminology, and yeah, businesses are demanding to participate in the open-source ecosystem. Here in reality, what I see is not ideological purity or anything like that and much more to do with, “Yeah, we don't like having a single commercial vendor for our databases that basically plays the insert quarter to continue dance whenever we're trying to wind up doing something new. We want the ability to not have licensing constraints around when, where, how, and how quickly we can run databases.” That's what I hear when customers are actually talking about open-source versus proprietary databases. Is that what you see or do you think that plays out differently? Because let's be clear, you do have a number of database services that you offer that are not open-source, but are also absolutely not tied to weird licensing restrictions either?Andi: That's a great question, and I think for years now, customers have been in a difficult spot because the legacy proprietary database vendors, you know, knew how sticky the database is, and so as a result, you know, the prices often went up and was not easy for customers to kind of manage costs and agility and so on. But I would say that's always been somewhat of a concern. I think what I'm seeing changing and happening differently now is as customers are moving into the cloud and they want to run hybrid cloud, they want to run multi-cloud, they need to prove to their regulator that it can do a stressed exit, right, open-source is not just about reducing cost, it's really about flexibility and kind of being in control of when and where you can run the workloads. So, I think what we're really seeing now is a significant surge of customers who are trying to get off legacy proprietary database and really kind of move to open APIs, right, because they need that freedom. And that freedom is far more important to them than even the cost element.And what's really interesting is, you know, a lot of these are the decision-makers in these enterprises, not just the technical folks. Like, to your point, it's not just open-source advocates, right? It's really the business people who understand they need the flexibility. And by the way, even the regulators are asking them to show that they can flexibly move their workloads as they need to. So, we're seeing a huge interest there and, as you said, like, some of our services, you know, are open-source-based services, some of them are not.Like, take Spanner, as an example, it is heavily tied to how we build our infrastructure and how we build our systems. Like, I would say, it's almost impossible to open-source Spanner, but what we've done is we've basically embraced open APIs and made sure if a customer uses these systems, we're giving them control of when and where they want to run their workloads. So, for example, Big Table has an HBase API; Spanner now has a Postgres interface. So, our goal is really to give customers as much flexibility and also not lock them into Google Cloud. Like, we want them to be able to move out of Google Cloud so they have control of their destiny.Corey: I'm curious to know what you see happening in the real world because I can sit here and come up with a bunch of very well-thought-out logical reasons to go towards or away from certain patterns, but I spent years building things myself. I know how it works, you grab the closest thing handy and throw it in and we all know that there is nothing so permanent as a temporary fix. Like, that thing is load-bearing and you'll retire with that thing still in place. In the idealized world, I don't think that I would want to take a dependency on something like—easy example—Spanner or AlloyDB because despite the fact that they have Postgres-squeal—yes, that's how I pronounce it—compatibility, the capabilities of what they're able to do under the hood far exceed and outstrip whatever you're going to be able to build yourself or get anywhere else. So, there's a dataflow architectural dependency lock-in, despite the fact that it is at least on its face, Postgres compatible. Counterpoint, does that actually matter to customers in what you are seeing?Andi: I think it's a great question. I'll give you a couple of data points. I mean, first of all, even if you take a complete open-source product, right, running them in different clouds, different on-premises environments, and so on, fundamentally, you will have some differences in performance characteristics, availability characteristics, and so on. So, the truth is, even if you use open-source, right, you're not going to get a hundred percent of the same characteristics where you run that. But that said, you still have the freedom of movement, and with I would say and not a huge amount of engineering investment, right, you're going to make sure you can run that workload elsewhere.I kind of think of Spanner in the similar way where yes, I mean, you're going to get all those benefits of Spanner that you can't get anywhere else, like unlimited scale, global consistency, right, no maintenance downtime, five-nines availability, like, you can't really get that anywhere else. That said, not every application necessarily needs it. And you still have that option, right, that if you need to, or want to, or we're not giving you a reasonable price or reasonable price performance, but we're starting to neglect you as a customer—which of course we wouldn't, but let's just say hypothetically, that you know, that could happen—that you still had a way to basically go and run this elsewhere. Now, I'd also want to talk about some of the upsides something like Spanner gives you. Because you talked about, you want to be able to just grab a few things, build something quickly, and then, you know, you don't want to be stuck.The counterpoint to that is with Spanner, you can start really, really small, and then let's say you're a gaming studio, you know, you're building ten titles hoping that one of them is going to take off. So, you can build ten of those, you know, with very minimal spend on Spanner and if one takes off overnight, it's really only the database where you don't have to go and re-architect the application; it's going to scale as big as you need it to. And so, it does enable a lot of this innovation and a lot of cost management as you try to get to that overnight success.Corey: Yeah, overnight success. I always love that approach. It's one of those, “Yeah, I became an overnight success after only ten short years.” It becomes this idea people believe it's in fits and starts, but then you see, I guess, on some level, the other side of it where it's a lot of showing up and doing the work. I have to confess, I didn't do a whole lot of admin work in my production years that touched databases because I have an aura and I'm unlucky, and it turns out that when you blow away some web servers, everyone can laugh and we'll reprovision stateless things.Get too close to the data warehouse, for example, and you don't really have a company left anymore. And of course, in the world of finance that I came out of, transactional integrity is also very much a thing. A question that I had [centers 00:17:51] really around one of the predictions you gave recently at Google Cloud Next, which is your prediction for the future is that transactional and analytical workloads from a database perspective will converge. What's that based on?Andi: You know, I think we're really moving from a world where customers are trying to make real-time decisions, right? If there's model drift from an AI and ML perspective, want to be able to retrain their models as quickly as possible. So, everything is fast moving into streaming. And I think what you're starting to see is, you know, customers don't have that time to wait for analyzing their transactional data. Like in the past, you do a batch job, you know, once a day or once an hour, you know, move the data from your transactional system to analytical system, but that's just not how it is always-on businesses run anymore, and they want to have those real-time insights.So, I do think that what you're going to see is transactional systems more and more building analytical capabilities, analytical systems building, and more transactional, and then ultimately, cloud platform providers like us helping fill that gap and really making data movement seamless across transactional analytical, and even AI and ML workloads. And so, that's an area that I think is a big opportunity. I also think that Google is best positioned to solve that problem.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: On some level, I've found that, at least in my own work, that once I wind up using a database for something, I'm inclined to try and stuff as many other things into that database as I possibly can just because getting a whole second data store, taking a dependency on it for any given workload tends to be a little bit on the, I guess, challenging side. Easy example of this. I've talked about it previously in various places, but I was talking to one of your colleagues, [Sarah Ellis 00:19:48], who wound up at one point making a joke that I, of course, took way too far. Long story short, I built a Twitter bot on top of Google Cloud Functions that every time the Azure brand account tweets, it simply quote-tweets that translates their tweet into all caps, and then puts a boomer-style statement in front of it if there's room. This account is @cloudboomer.Now, the hard part that I had while doing this is everything stateless works super well. Where do I wind up storing the ID of the last tweet that it saw on his previous run? And I was fourth and inches from just saying, “Well, I'm already using Twitter so why don't we use Twitter as a database?” Because everything's a database if you're either good enough or bad enough at programming. And instead, I decided, okay, we'll try this Firebase thing first.And I don't know if it's Firestore, or Datastore or whatever it's called these days, but once I wrap my head around it incredibly effective, very fast to get up and running, and I feel like I made at least a good decision, for once in my life, involving something touching databases. But it's hard. I feel like I'm consistently drawn toward the thing I'm already using as a default database. I can't shake the feeling that that's the wrong direction.Andi: I don't think it's necessarily wrong. I mean, I think, you know, with Firebase and Firestore, that combination is just extremely easy and quick to build awesome mobile applications. And actually, you can build mobile applications without a middle tier which is probably what attracted you to that. So, we just see, you know, huge amount of developers and applications. We have over 4 million databases in Firestore with just developers building these applications, especially mobile-first applications. So, I think, you know, if you can get your job done and get it done effectively, absolutely stick to them.And by the way, one thing a lot of people don't know about Firestore is it's actually running on Spanner infrastructure, so Firestore has the same five-nines availability, no maintenance downtime, and so on, that has Spanner, and the same kind of ability to scale. So, it's not just that it's quick, it will actually scale as much as you need it to and be as available as you need it to. So, that's on that piece. I think, though, to the same point, you know, there's other databases that we're then trying to make sure kind of also extend their usage beyond what they've traditionally done. So, you know, for example, we announced AlloyDB, which I kind of call it Postgres on steroids, we added analytical capabilities to this transactional database so that as customers do have more data in their transactional database, as opposed to having to go somewhere else to analyze it, they can actually do real-time analytics within that same database and it can actually do up to 100 times faster analytics than open-source Postgres.So, I would say both Firestore and AlloyDB, are kind of good examples of if it works for you, right, we'll also continue to make investments so the amount of use cases you can use these databases for continues to expand over time.Corey: One of the weird things that I noticed just looking around this entire ecosystem of databases—and you've been in this space long enough to, presumably, have seen the same type of evolution—back when I was transiting between different companies a fair bit, sometimes because I was consulting and other times because I'm one of the greatest in the world at getting myself fired from jobs based upon my personality, I found that the default standard was always, “Oh, whatever the database is going to be, it started off as MySQL and then eventually pivots into something else when that starts falling down.” These days, I can't shake the feeling that almost everywhere I look, Postgres is the answer instead. What changed? What did I miss in the ecosystem that's driving that renaissance, for lack of a better term?Andi: That's a great question. And, you know, I have been involved in—I'm going to date myself a bit—but in PHP since 1997, pretty much, and one of the things we kind of did is we build a really good connector to MySQL—and you know, I don't know if you remember, before MySQL, there was MS SQL. So, the MySQL API actually came from MS SQL—and we bundled the MySQL driver with PHP. And so, kind of that LAMP stack really took off. And kind of to your point, you know, the default in the web, right, was like, you're going to start with MySQL because it was super easy to use, just fun to use.By the way, I actually wrote—co-authored—the tab completion in the MySQL client. So like, a lot of these kinds of, you know, fun, simple ways of using MySQL were there, and frankly, was super fast, right? And so, kind of those fast reads and everything, it just was great for web and for content. And at the time, Postgres kind of came across more like a science project. Like the folks who were using Postgres were kind of the outliers, right, you know, the less pragmatic folks.I think, what's changed over the past, how many years has it been now, 25 years—I'm definitely dating myself—is a few things: one, MySQL is still awesome, but it didn't kind of go in the direction of really, kind of, trying to catch up with the legacy proprietary databases on features and functions. Part of that may just be that from a roadmap perspective, that's not where the owner wanted it to go. So, MySQL today is still great, but it didn't go into that direction. In parallel, right, customers wanting to move more to open-source. And so, what they found this, the thing that actually looks and smells more like legacy proprietary databases is actually Postgres, plus you saw an increase of investment in the Postgres ecosystem, also very liberal license.So, you have lots of other databases including commercial ones that have been built off the Postgres core. And so, I think you are today in a place where, for mainstream enterprise, Postgres is it because that is the thing that has all the features that the enterprise customer is used to. MySQL is still very popular, especially in, like, content and web, and mobile applications, but I would say that Postgres has really become kind of that de facto standard API that's replacing the legacy proprietary databases.Corey: I've been on the record way too much as saying, with some justification, that the best database in the world that should be used for everything is Route 53, specifically, TXT records. It's a key-value store and then anyone who's deep enough into DNS or databases generally gets a slightly greenish tinge and feels ill. That is my simultaneous best and worst database. I'm curious as to what your most controversial opinion is about the worst database in the world that you've ever seen.Andi: This is the worst database? Or—Corey: Yeah. What is the worst database that you've ever seen? I know, at some level, since you manage all things database, I'm asking you to pick your least favorite child, but here we are.Andi: Oh, that's a really good question. No, I would say probably the, “Worst database,” double-quotes is just the file system, right? When folks are basically using the file system as regular database. And that can work for, you know, really simple apps, but as apps get more complicated, that's not going to work. So, I've definitely seen some of that.I would say the most awesome database that is also file system-based kind of embedded, I think was actually SQLite, you know? And SQLite is actually still very, very popular. I think it sits on every mobile device pretty much on the planet. So, I actually think it's awesome, but it's, you know, it's on a database server. It's kind of an embedded database, but it's something that I, you know, I've always been pretty excited about. And, you know, their stuff [unintelligible 00:27:43] kind of new, interesting databases emerging that are also embedded, like DuckDB is quite interesting. You know, it's kind of the SQLite for analytics.Corey: We've been using it for a few things around a bill analysis ourselves. It's impressive. I've also got to say, people think that we had something to do with it because we're The Duckbill Group, and it's DuckDB. “Have you done anything with this?” And the answer is always, “Would you trust me with a database? I didn't think so.” So no, it's just a weird coincidence. But I liked that a lot.It's also counterintuitive from where I sit because I'm old enough to remember when Microsoft was teasing the idea of WinFS where they teased a future file system that fundamentally was a database—I believe it's an index or journal for all of that—and I don't believe anything ever came of it. But ugh, that felt like a really weird alternate world we could have lived in.Andi: Yeah. Well, that's a good point. And by the way, you know, if I actually take a step back, right, and I kind of half-jokingly said, you know, file system and obviously, you know, all the popular databases persist on the file system. But if you look at what's different in cloud-first databases, right, like, if you look at legacy proprietary databases, the typical setup is wright to the local disk and then do asynchronous replication with some kind of bounded replication lag to somewhere else, to a different region, or so on. If you actually start to look at what the cloud-first databases look like, they actually write the data in multiple data centers at the same time.And so, kind of joke aside, as you start to think about, “Hey, how do I build the next generation of applications and how do I really make sure I get the resiliency and the durability that the cloud can offer,” it really does take a new architecture. And so, that's where things like, you know, Spanner and Big Table, and kind of, AlloyDB databases are truly architected for the cloud. That's where they actually think very differently about durability and replication, and what it really takes to provide the highest level of availability and durability.Corey: On some level, I think one of the key things for me to realize was that in my own experiments, whenever I wind up doing something that is either for fun or I just want see how it works in what's possible, the scale of what I'm building is always inherently a toy problem. It's like the old line that if it fits in RAM, you don't have a big data problem. And then I'm looking at things these days that are having most of a petabyte's worth of RAM sometimes it's okay, that definition continues to extend and get ridiculous. But I still find that most of what I do in a database context can be done with almost any database. There's no reason for me not to, for example, uses a SQLite file or to use an object store—just there's a little latency, but whatever—or even a text file on disk.The challenge I find is that as you start scaling and growing these things, you start to run into limitations left and right, and only then it's one of those, oh, I should have made different choices or I should have built-in abstractions. But so many of these things comes to nothing; it just feels like extra work. What guidance do you have for people who are trying to figure out how much effort to put in upfront when they're just more or less puttering around to see what comes out of it?Andi: You know, we like to think about ourselves at Google Cloud as really having a unique value proposition that really helps you future-proof your development. You know, if I look at both Spanner and I look at BigQuery, you can actually start with a very, very low cost. And frankly, not every application has to scale. So, you can start at low cost, you can have a small application, but everyone wants two things: one is availability because you don't want your application to be down, and number two is if you have to scale you want to be able to without having to rewrite your application. And so, I think this is where we have a very unique value proposition, both in how we built Spanner and then also how we build BigQuery is that you can actually start small, and for example, on Spanner, you can go from one-tenth of what we call an instance, like, a small instance, that is, you know, under $65 a month, you can go to a petabyte scale OLTP environment with thousands of instances in Spanner, with zero downtime.And so, I think that is really the unique value proposition. We're basically saying you can hold the stick at both ends: you can basically start small and then if that application doesn't need to scale, does need to grow, you're not reengineering your application and you're not taking any downtime for reprovisioning. So, I think that's—if I had to give folks, kind of, advice, I say, “Look, what's done is done. You have workloads on MySQL, Postgres, and so on. That's great.”Like, they're awesome databases, keep on using them. But if you're truly building a new app, and you're hoping that app is going to be successful at some point, whether it's, like you said, all overnight successes take at least ten years, at least you built in on something like Spanner, you don't actually have to think about that anymore or worry about it, right? It will scale when you need it to scale and you're not going to have to take any downtime for it to scale. So, that's how we see a lot of these industries that have these potential spikes, like gaming, retail, also some use cases in financial services, they basically gravitate towards these databases.Corey: I really want to thank you for taking so much time out of your day to talk with me about databases and your perspective on them, especially given my profound level of ignorance around so many of them. If people want to learn more about how you view these things, where's the best place to find you?Andi: Follow me on LinkedIn. I tend to post quite a bit on LinkedIn, I still post a bit on Twitter, but frankly, I've moved more of my activity to LinkedIn now. I find it's—Corey: That is such a good decision. I envy you.Andi: It's a more curated [laugh], you know, audience and so on. And then also, you know, we just had Google Cloud Next. I recorded a session there that kind of talks about database and just some of the things that are new in database-land at Google Cloud. So, that's another thing that if folks more interested to get more information, that may be something that could be appealing to you.Corey: We will, of course, put links to all of this in the [show notes 00:34:03]. Thank you so much for your time. I really appreciate it.Andi: Great. Corey, thanks so much for having me.Corey: Andi Gutmans, VP and GM of Databases at Google Cloud. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment, then I'm going to collect all of those angry, insulting comments and use them as a database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Google Cloud Platform Podcast
ML/AI Data Science for Data Analytics with Jed Dougherty and Dan Darnell

Google Cloud Platform Podcast

Play Episode Listen Later Nov 9, 2022 32:13


On the show this week, Carter Morgan and Anu Srivastava talk about AI and ML data analytics with Dataiku VP of Platform Strategy, Jed Dougherty, and Head of Product Marketing, Dan Darnell. Dataiku is an AI platform targeted for business team collaboration. The low and no code environments make it easy for developers and not so tech savvy employees to work together on analytics projects. It strives for everyday AI, making these normally highly technical data processes more accessible. Our guests detail the tools Dataiku provides customers, including ML Ops features for efficient models. Dataiku's managed offering allows businesses to concentrate on the model while Dataiku takes care of things like the deployment processes behind the scenes. We hear about the partnership between Dataiku and Google Cloud and Dataiku's integration with AlloyDB. Through a real example, our guests run us through the use of these two tools together. Jed talks about why Google Cloud works so well with Dataiku, especially for businesses looking for cutting edge technology. Jed Dougherty Jed is the VP of Platform Strategy at Dataiku. In this role he acts as a strategic technical advisor to Dataiku customers and prospects. He also works tightly with Engineering and Product stakeholders in order to ensure that all technical platform requests are properly followed, scoped and implemented. Dan Darnell Dan has over 20 years of experience in the analytics industry at established software companies, hyper-growth technology companies, and small technology start-ups. As the Head of Product Marketing at Dataiku, he owns positioning, evangelism, and content creation for product offerings and education on products for customers and partners. Cool things of the week Google Cloud supercharges NLP with large language models blog Practicing the principle of least privilege with Cloud Build and Artifact Registry blog Interview Dataiku site Dataiku YouTube videos BigQuery site Kubernetes site GKE site AlloyDB for PostgreSQL site Accelerate AI Adoption: 3 Steps to Deploy Dataiku for Google Cloud Platform blog Implementing Dataiku with BigQuery docs GCP Podcast Episode 238: ASML with Arnaud Hubaux podcast GCP Podcast Episode 229: Lucidworks with Radu Miclaus podcast What's something cool you're working on? Anu is working on interesting speech use cases and Google's Speech to Text. Join in with this tutorial! Carter is working on getting organized and working on something super cool! Hosts Carter Morgan and Anu Srivastava

Google Cloud Platform Podcast
Assured Workloads with Key Access Justifications with Bryce Buffaloe and Seth Denney

Google Cloud Platform Podcast

Play Episode Listen Later Nov 2, 2022 42:17


Hosts Max Saltonstall and Daryl Ducharme are joined by Bryce Buffaloe and Seth Denney to chat about Assured Workloads and the sovereignty control Key Access Justifications so customers can see how their data is used and control who can see what. Assured Workloads with Google is a security and compliance engine that allows users to control their data with the help of Google. With the expansion of data use around the globe, data sovereignty has become more important as well, and Google Cloud products offer myriad tools to maintain control, privacy, and compliance no matter the location. Seth talks more about sovereignty and how it's changing data storage and management. Our guests talk about how Google has tackled the sovereignty issues, difficult decisions that had to be made, and the process of working with clients to optimize tools for different security and sovereignty scenarios. With Key Access Justifications, Google has bolstered its offerings to provide clients with trustworthy controls to keep data secure and sovereign, from Compute Engine VMs to BigQuery. We learn what Key Access Justifications look like for users and how the encryption keys work in different Google Cloud services. Customer managed key material is stored outside of Google and the key manager must give permission for access for an added layer of trust and security. Seth and Bryce explain why this is important and describe how KAJ are used with some examples. These features may also be used to improve security in the future by preventing data from being decrypted and stolen should someone ever get access to your system. We hear more about the future of data security and sovereignty, including simplifying the process with managed services and easier onboarding. Strategic European partnerships are helping Google tackle these important issues overseas so clients can focus on their businesses and worry less about data security. The catalyst for KAJ was a large German bank that recognized the sovereignty changes coming, and we hear more about the origins of KAJ and the path to where it is today. When paired with Assured Workloads, clients get maximum sovereignty coverage. Seth talks a little about the Sovereignty Access Controls done internally as well. Bryce walks us through using these Google services with a European example. Bryce Buffaloe Bryce is Product manager for Google Cloud Security managing the portfolio of the Assured Workload's solution suite. Seth Denney Seth is KAJ Tech Lead, responsible for ensuring the integrity and usefulness of KAJs to support customer data sovereignty Cool things of the week DevFests site Best Kept Security Secrets: Tap into the power of Organization Policy Service podcast Interview Assured Workloads site Assured Workloads Playlist videos Key Access Justifications docs Compute Engine site BigQuery site GCP Podcast Episode 325: Digital Sovereignty with Archana Ramamoorthy and Julien Blanchez podcast T Systems site What's something cool you're working on? Daryl just released a video about using Workflows' new parallel step. Max is working on crossover episodes across our various podcast streams, so we can have SRE guests on to the GCP podcast to talk reliability, for example, or bring some of the Kubernetes hosts to the Cloud Security podcast to discuss securing Kubernetes workloads. Hosts Max Saltonstall and Daryl Ducharme

Datascape Podcast
Episode 69 - Recapping The Fall 2022 Google Next Conference With Sandeep Arora

Datascape Podcast

Play Episode Listen Later Oct 20, 2022 54:38


In this episode Warner is joined by Sandeep Arora to discuss the latest announcements from the Fall 2022 Google Next conference! These include new cloud regions, BigQuery new features, AlloyDb, Spanner, AI and ML announcements and more!

Google Cloud Platform Podcast
Top 5 Data & Analytics Launches from Next 2022 with Bruno Aziza and Maire Newton

Google Cloud Platform Podcast

Play Episode Listen Later Oct 19, 2022 30:51


Debi Cabrera and Stephanie Wong have more great Next content this week as we focus on launches specifically related to data and analytics with guests Bruno Aziza and Maire Newton. We start the episode with a look at current customer trends in data, including tools for increasing efficiency when working with many different types of data. Data governance and security is another area where Bruno sees advances in satisfying customer needs. Maire talks about the steps Google is taking to help customers implement knowledge gained with data, including Looker and new integrations with tools like Looker Studio to easily connect tools for better data access and use. Strategic partnerships with companies like Tableau help accomplish these goals as well. With 21 data and analytics launches at Next, exciting solutions are out there for customers. Bruno and Maire highlight their five favorites, like BigQuery support for unstructured data, allowing analysts working with SQL to do more with more data. To simplify workflows, BigQuery integration with Spark is a new feature that Maire tells us about, and we hear more about BigLake and it's increased format support. Data reaches more people easier now with Connected Sheets available for anyone using Google Workspace, and finally we talk more about Looker. Bruno details the four use cases of business intelligence customers and how Google's suite of data products satisfy their needs for a reasonable price. Bruno Aziza Bruno is head of data and analytics for Google Cloud and leads the outbound product management team. He has more than two decades' of Silicon Valley experience, specializing in scaling businesses, and has written two books on Data Analytics and Enterprise Performance Management. Maire Newton Maire is an Outbound Product Manager at Google Cloud with almost 15 years of experience partnering with organizations to develop data solutions and drive digital transformation. She's passionate about helping customers develop data-driven cultures by using technology to meet users where they are. Cool things of the week Google Cloud Next for data professionals: analytics, databases and business intelligence blog ANA104 How Boeing overcame their on-premises implementation challenges with data & AI site ANA100 What's new in Looker and Data Studio site ANA101 What's new in BigQuery site ANA106 How leading organizations are making open source their super power site Google Cloud Next: top AI and ML sessions blog Interview Building the most open data cloud ecosystem post Data Journeys videos Google Cloud Next ‘22 site Looker site Looker Studio site Tableau site BigLake ste BigQuery site Use the BigQuery connector with Spark docs Connected Sheets docs What's something cool you're working on? Debi is getting married and working on Dataflow Prime. Hosts Stephanie Wong and Debi Cabrera

Google Cloud Reader
Exporting and Analyzing Billing Data Using BigQuery

Google Cloud Reader

Play Episode Listen Later Oct 18, 2022 6:48


Original blog post More articles at cloud.google.com/blog

Google Cloud Platform Podcast
Next 2022 with Forrest Brazeal

Google Cloud Platform Podcast

Play Episode Listen Later Oct 12, 2022 43:44


Forrest Brazeal joins Stephanie Wong today on the second day of Google Cloud Next ‘22. We're talking about all the exciting announcements, how the conference has changed in recent years, and what to expect in the days ahead. The excitement and energy of the first in-person Next since 2019 was one of the best parts for Forrest. With 1300 releases in just half the year, a lot has happened in BigQuery, AI, Looker, and more. Next includes announcements in many of these areas as well, as Google Cloud expands and makes Cloud easier for all types of projects and clients. Strategic partnerships and development have allowed better use of Google Cloud for the virtual work world and advancements in sustainability have helped Google users feel better about their impact on the environment. New announcements in compute include C3 VMs, the first VM in the cloud with 4th Gen Intel Xeon scalable processors with Google's custom Intel IPU. MediaCDN uses the YouTube infrastructure and the new Live Stream API optimizes streaming capabilities. Among many other announcements, Network Analyzer is now GA allowing for simplified network configuration monitoring and Google Cloud Armor has been extended to include ML-based Adaptive Protection capabilities. Software Delivery Shield and Cloud Workstations are recent offerings to help developers in each of the four areas of software supply chain management. Advancements in Cloud Build include added security benefits, and new GKE and Cloud Run logging and security alerts ensure projects remain secure through the final stages of development. The best way to ensure secure, optimized work is with well-trained developers. And in that vein, Google Cloud is introducing Innovators Plus to provide a new suite of developer benefits under a fixed cost subscription. Forrest tells us about #GoogleClout and the challenges available in the Next portal for conference-goers. Assured Workloads helps with data sovereignty in different regions, Confidential Space in Confidential Computing provides trust guarantees when companies perform joint data analysis and machine learning training, and Chronicle Security Operations are some of the exciting security announcements we saw at Next. On the show next week, we'll go in depth on data announcements at Next, but Steph gives us a quick rundown of some of the biggest ones today. She talks briefly about announcements in AI, including Vertex AI Vision and Translation Hub. Forrest wraps up by talking about predictions for the future of tech and cloud. Forrest Brazeal Forrest Brazeal is a cloud educator, author, speaker, and Pwnie Award-winning songwriter. He is the creator of the Cloud Resume Challenge initiative, which has helped thousands of non-traditional learners take their first steps into the cloud. Cool things of the week Unlock biology & medicine potential with AlphaFold on Google Cloud video Interview Google Cloud Next ‘22 site Google Cloud Innovators site What's next for digital transformation in the cloud blog New cloud regions coming to a country near you blog The next wave of Google Cloud infrastructure innovation: New C3 VM and Hyperdisk blog 20+ Cloud Networking innovations unveiled at Google Cloud Next blog Introducing Software Delivery Shield for end-to-end software supply chain security blog Developers - Build, learn, and grow your career faster with Google Cloud blog Advancing digital sovereignty on Europe's terms blog Introducing Confidential Space to help unlock the value of secure data collaboration blog Introducing Chronicle Security Operations: Detect, investigate, and respond to cyberthreats with the speed, scale, and intelligence of Google blog What's new in Google Cloud databases: More unified. More open. More intelligent. blog Building the most open data cloud ecosystem: Unifying data across multiple sources and platforms blog Introducing the next evolution of Looker, your unified business intelligence platform blog Vertex AI Vision site New AI Agents can drive business results faster: Translation Hub, Document AI, and Contact Center AI blog Open source collaborations and key partnerships to help accelerate AI innovation blog Google Cloud Launches First-of-Its-Kind Service to Simplify Mainframe Modernization for Customers in Financial Services, Retail, Healthcare and Other Industries article Project Starline expands testing through an early access program blog What's something cool you're working on? Steph is working on the developer keynote and DevFest and UKI Google Cloud Next Developer Day. Check out her Next talk “Simplify and secure your network for all workloads”. Hosts Stephanie Wong

The Cloud Pod
184: The CloudPod Explicitly trusts itself

The Cloud Pod

Play Episode Listen Later Oct 7, 2022 52:22


On The Cloud Pod this week, AWS announces an update to IAM role trust policy behavior, Easily Collect Vehicle Data and Send to the Cloud with new AWS IoT FleetWise, now generally available, Get a head start with no-cost learning challenges before Google Next ‘22. Thank you to our sponsor, Foghorn Consulting, which provides top notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you're having trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week. Episode Highlights ⏰ AWS announces an update to IAM role trust policy behavior. ⏰ Easily Collect Vehicle Data and Send to the Cloud with new AWS IoT FleetWise, now generally available. ⏰ Get a head start with no-cost learning challenges before Google Next ‘22. General News:

AI and the Future of Work
Ahmed Elsamadisi, Narrator CEO, is a roboticist by training and one of the first engineers at WeWork. Now he's changing how the world tells stories with data.

AI and the Future of Work

Play Episode Listen Later Oct 2, 2022 52:08


Ahmed Elsamadisi built the data infrastructure at WeWork before realizing every company could benefit from his team's innovation. Traditional star schemas aren't the best way to manage data. Ahmed instead pioneered a new approach using a single-table column model better suited for real questions people ask. He launched Narrator in 2017 to make it easier to turn data questions into answers and has since raised $6.2M from Initialized Capital, Flybridge Capital Partners, and Y Combinator. Ahmed received his BS in Robotics from Cornell. Hear from a pioneer (and tech provocateur) how new data wrangling techniques are making it easier for mere mortals to get more value out of their data.Listen and learn…How a roboticist who got his start building self-driving cars and designing missile defense systems ended up redefining how data is storedWhy traditional approaches that require SQL to access data are brokenHow a single-column schema eliminates the complexity of joining systems and tablesWhy it's easier to tell better stories with data using temporal relationships extracted from customer journeysWhy Snowflake, Redshift, and BigQuery are really all the same… and data modeling is the place to innovate What it means to replace traditional tables with activities… and why they'll eliminate the need for specialized data analysts How to reduce data storage costs by 90% and time to generate data insights from weeks to minutes Why data management vendors are responsible for bad decisions made using your data What is data cleaning and how you should do it What is a racist algorithm Why querying data with natural language will never work Is the WeCrashed version of Adam Neumann's neuroticism accurate? Hear from someone who lived it... References in this episode:Google's LaMDA isn't sentientChandra Khatri from Got It AI on AI and the Future of Work Derek Steer from Mode on AI and the Future of Work Barr Moses from Monte Carlo on AI and the Future of Work Peter Fishman from Mozart Data on AI and the Future of Work Ahmed on Twitter 

Google Cloud Platform Podcast
DEI and Belonging in the Cloud with Jason Smith

Google Cloud Platform Podcast

Play Episode Listen Later Sep 28, 2022 33:26


Jason Smith, founder of the Mixed Googlers group here at Google, joins Stephanie Wong to talk about DEI and the importance of belonging in tech. Jason helps us better understand what the concepts diversity, equity, inclusion, and belonging mean to him. It's more than just including different types of people, Jason tells us, companies must also give them equal opportunities and say in their jobs. We talk about the difference between DEI and belonging. Belonging means feeling comfortable and accepted and conveys a more concrete, real-life sense of community that brings DEI to life. While DEI is easy enough for a company to measure, it's sometimes tricky to get a clear picture of belonging in a company. Jason talks about possible solutions to this problem. Growing up as the child of both a white and a black parent, Jason understands the importance of feeling a sense of belonging as a mixed race individual. In that vein, he founded Mixed Googlers, and he tells us more about how this group supports other mixed individuals at Google. He talks about the events they have hosted, including talks with famous mixed race speakers, and how the grassroots efforts to form and grow Mixed Googlers has created a great community. Later, Jason talks about DEI and belonging in tech companies and cloud specifically. He introduces us to some fun ways to incorporate DEI principles into company culture in a way that encourages all individuals to contribute their personal perspectives. He stresses the importance of allowing mistakes, especially when discussing diversity issues with your coworkers, so the conversation can be about growth and not about confrontation. Jason Smith Jason Smith is a Customer Engineer supporting application modernization and the founder of Mixed Googlers, an ERG dedicated to mixed race individuals. Cool things of the week Sign up for the Google Cloud Fly Cup Challenge blog Google Cloud Firewall introduces Network Firewall Policies, IAM-governed Tags and more blog Building trust in the data with Dataplex blog Interview Google Belonging site Google 2022 Diversity Annual Report site Sugi Dakks: Not the Only One | Talks at Google video Diversity, Equity, and Inclusion Keynote site BigQuery site What's something cool you're working on? Stephanie is working on content for Next and the Drone Racing League. Hosts Stephanie Wong

Screaming in the Cloud
How Data Discovery is Changing the Game with Shinji Kim

Screaming in the Cloud

Play Episode Listen Later Sep 22, 2022 32:58


About ShinjiShinji Kim is the Founder & CEO of Select Star, an automated data discovery platform that helps you to understand & manage your data. Previously, she was the Founder & CEO of Concord Systems, a NYC-based data infrastructure startup acquired by Akamai Technologies in 2016. She led the strategy and execution of Akamai IoT Edge Connect, an IoT data platform for real-time communication and data processing of connected devices. Shinji studied Software Engineering at University of Waterloo and General Management at Stanford GSB.Links Referenced: Select Star: https://www.selectstar.com/ LinkedIn: https://www.linkedin.com/company/selectstarhq/ Twitter: https://twitter.com/selectstarhq TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: I come bearing ill tidings. Developers are responsible for more than ever these days. Not just the code that they write, but also the containers and the cloud infrastructure that their apps run on. Because serverless means it's still somebody's problem. And a big part of that responsibility is app security from code to cloud. And that's where our friend Snyk comes in. Snyk is a frictionless security platform that meets developers where they are - Finding and fixing vulnerabilities right from the CLI, IDEs, Repos, and Pipelines. Snyk integrates seamlessly with AWS offerings like code pipeline, EKS, ECR, and more! As well as things you're actually likely to be using. Deploy on AWS, secure with Snyk. Learn more at Snyk.co/scream That's S-N-Y-K.co/screamCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while, I encounter a company that resonates with something that I've been doing on some level. In this particular case, that is what's happened here, but the story is slightly different. My guest today is Shinji Kim, who's the CEO and founder at Select Star.And the joke that I was making a few months ago was that Select Stars should have been the name of the Oracle ACE program instead. Shinji, thank you for joining me and suffering my ridiculous, basically amateurish and sophomore database-level jokes because I am bad at databases. Thanks for taking the time to chat with me.Shinji: Thanks for having me here, Corey. Good to meet you.Corey: So, Select Star despite being the only query pattern that I've ever effectively been able to execute from memory, what you do as a company is described as an automated data discovery platform. So, I'm going to start at the beginning with that baseline definition. I think most folks can wrap their heads around what the idea of automated means, but the rest of the words feel like it might mean different things to different people. What is data discovery from your point of view?Shinji: Sure. The way that we define data discovery is finding and understanding data. In other words, think about how discoverable your data is in your company today. How easy is it for you to find datasets, fields, KPIs of your organization data? And when you are looking at a table, column, dashboard, report, how easy is it for you to understand that data underneath? Encompassing on that is how we define data discovery.Corey: When you talk about data lurking around the company in various places, that can mean a lot of different things to different folks. For the more structured data folks—which I tend to think of as the organized folks who are nothing like me—that tends to mean things that live inside of, for example, traditional relational databases or things that closely resemble that. I come from a grumpy old sysadmin perspective, so I'm thinking, oh, yeah, we have a Jira server in the closet and that thing's logging to its own disk, so that's going to be some information somewhere. Confluence is another source of data in an organization; it's usually where insight and a knowledge of what's going on goes to die. It's one of those write once, read never type of things.And when I start thinking about what data means, it feels like even that is something of a squishy term. From the perspective of where Select Start starts and stops, is it bounded to data that lives within relational databases? Does it go beyond that? Where does it start? Where does it stop?Shinji: So, we started the company with an intention of increasing the discoverability of data and hence providing automated data discovery capability to organizations. And the part where we see this as the most effective is where the data is currently being consumed today. So, this is, like, where the data consumption happens. So, this can be a data warehouse or data lake, but this is where your data analysts, data scientists are querying data, they are building dashboards, reports on top of, and this is where your main data mart lives.So, for us, that is primarily a cloud data warehouse today, usually has a relational data structure. On top of that, we also do a lot of deep integrations with BI tools. So, that includes tools like Tableau, Power BI, Looker, Mode. Wherever these queries from the business stakeholders, BI engineers, data analysts, data scientists run, this is a point of reference where we use to auto-generate documentation, data models, lineage, and usage information, to give it back to the data team and everyone else so that they can learn more about the dataset they're about to use.Corey: So, given that I am seeing an increased number of companies out there talking about data discovery, what is it the Select Star does that differentiates you folks from other folks using similar verbiage in how they describe what they do?Shinji: Yeah, great question. There are many players that popping up, and also, traditional data catalog's definitely starting to offer more features in this area. The main differentiator that we have in the market today, we call it fast time-to-value. Any customer that is starting with Select Star, they get to set up their instance within 24 hours, and they'll be able to get all the analytics and data models, including column-level lineage, popularity, ER diagrams, and how other people are—top users and how other people are utilizing that data, like, literally in few hours, max to, like, 24 hours. And I would say that is the main differentiator.And most of the customers I have pointed out that setup and getting started has been super easy, which is primarily backed by a lot of automation that we've created underneath the platform. On top of that, just making it super easy and simple to use. It becomes very clear to the users that it's not just for the technical data engineers and DBAs to use; this is also designed for business stakeholders, product managers, and ops folks to start using as they are learning more about how to use data.Corey: Mapping this a little bit toward the use cases that I'm the most familiar with, this big source of data that I tend to stumble over is customer AWS bills. And that's not exactly a big data problem, given that it can fit in memory if you have a sufficiently exciting computer, but using Tableau don't wind up slicing and dicing that because at some point, Excel falls down. From my perspective, problem with Excel is that it doesn't tend to work on huge datasets very well, and from the position of Salesforce, the problem with Excel is that it doesn't cost a giant pile of money every month. So, those two things combined, Tableau is the answer for what we do. But that's sort of the end-all for us of, that's where it stops.At that point, we have dashboards that we build and queries that we run that spit out the thing we're looking at, and then that goes back to inform our analysis. We don't inherently feed that back into anything else that would then inform the rest of what we do. Now, for our use case, that probably makes an awful lot of sense because we're here to help our customers with their billing challenges, not take advantage of their data to wind up informing some giant model and mispurposing that data for other things. But if we were generating that data ourselves as a part of our operation, I can absolutely see the value of tying that back into something else. You wind up almost forming a reinforcing cycle that improves the quality of data over time and lets you understand what's going on there. What are some of the outcomes that you find that customers get to by going down this particular path?Shinji: Yeah, so just to double-click on what you just talked about, the way that we see this is how we analyze the metadata and the activity logs—system logs, user logs—of how that data has been used. So, part of our auto-generated documentation for each table, each column, each dashboard, you're going to be able to see the full data lineage: where it came from, how it was transformed in the past, and where it's going to. You will also see what we call popularity score: how many unique users are utilizing this data inside the organization today, how often. And utilizing these two core models and analysis that we create, you can start looking at first mapping out the data flow, and then determining whether or not this dataset is something that you would want to continue keeping or running the data pipelines for. Because once you start mapping these usage models of tables versus dashboards, you may find that there are recurring jobs that creates all these materialized views and tables that are feeding dashboards that are not being looked at anymore.So, with this mechanism by looking initially data lineage as a concept, a lot of companies use data lineage in order to find dependencies: what is going to break if I make this change in the column or table, as well as just debugging any of issues that is currently happening in their pipeline. So, especially when you will have to debug a SQL query or pipeline that you didn't build yourself but you need to find out how to fix it, this is a really easy way to instantly find out, like, where the data is coming from. But on top of that, if you start adding this usage information, you can trace through where the main compute is happening, which largest route table is still being queried, instead of the more summarized tables that should be used, versus which are the tables and datasets that is continuing to get created, feeding the dashboards and is those dashboards actually being used on the business side. So, with that, we have customers that have saved thousands of dollars every month just by being able to deprecate dashboards and pipelines that they were afraid of deprecating in the past because they weren't sure if anyone's actually using this or not. But adopting Select Star was a great way to kind of do a full spring clean of their data warehouse as well as their BI tool. And this is an additional benefit to just having to declutter so many old, duplicated, and outdated dashboards and datasets in their data warehouse.Corey: That is, I guess, a recurring problem that I see in many different pockets of the industry as a whole. You see it in the user visibility space, you see it in the cost control space—I even made a joke about Confluence that alludes to it—this idea that you build a whole bunch of dashboards and use it to inform all kinds of charts and other systems, but then people are busy. It feels like there's no ‘and then.' Like, one of the most depressing things in the universe that you can see after having spent a fair bit of effort to build up those dashboards is the analytics for who internally has looked at any of those dashboards since the demo you gave showing it off to everyone else. It feels like in many cases, we put all these projects and amount of effort into building these things out that then don't get used.People don't want to be informed by data they want to shoot from their gut. Now, sometimes that's helpful when we're talking about observability tools that you use to trace down outages, and, “Well, our site's really stable. We don't have to look at that.” Very awesome, great, awesome use case. The business insight level of dashboard just feels like that's something you should really be checking a lot more than you are. How do you see that?Shinji: Yeah, for sure. I mean, this is why we also update these usage metrics and lineage every 24 hours for all of our customers automatically, so it's just up-to-date. And the part that more customers are asking for where we are heading to—earlier, I mentioned that our main focus has been on analyzing data consumption and understanding the consumption behavior to drive better usage of your data, or making data usage much easier. The part that we are starting to now see is more customers wanting to extend those feature capabilities to their staff of where the data is being generated. So, connecting the similar amount of analysis and metadata collection for production databases, Kafka Queues, and where the data is first being generated is one of our longer-term goals. And then, then you'll really have more of that, up to the source level, of whether the data should be even collected or whether it should even enter the data warehouse phase or not.Corey: One of the challenges I see across the board in the data space is that so many products tend to have a very specific point of the customer lifecycle, where bringing them in makes sense. Too early and it's, “Data? What do you mean data? All I have are these logs, and their purpose is basically to inflate my AWS bill because I'm bad at removing them.” And on the other side, it's, “Great. We pioneered some of these things and have built our own internal enormous system that does exactly what we need to do.” It's like, “Yes, Google, you're very smart. Good job.” And most people are somewhere between those two extremes. Where are customers on that lifecycle or timeline when using Select Star makes sense for them?Shinji: Yeah, I think that's a great question. Also the time, the best place where customers would use Select Star for is that after they have their cloud data warehouse set up. Either they have finished their migration, they're starting to utilize it with their BI tools, and they're starting to notice that it's not just, like, you know, ten to fifty tables that they're starting with; most of them have more than hundreds of tables. And they're feeling that this is starting to go out of control because we have all these data, but we are not a hundred percent sure what exactly is in our database. And this usually just happens more in larger companies, companies at thousand-plus employees, and they usually find a lot of value out of Select Star right away because, like, we will start pointing out many different things.But we also see a lot of, like, forward-thinking, fast-growing startups that are at the size of a few hundred employees, you know, they now have between five to ten-person data team, and they are really creating the right single source of truth of their data knowledge through a Select Star. So, I think you can start anywhere from when your data team size is, like, beyond five and you're continuing to grow because every time you're trying to onboard a data analyst, data scientist, you will have to go through, like, basically the same type of training of your data model, and it might actually look different because the data models and the new features, new apps that you're integrating this changes so quickly. So, I would say it's important to have that base early on and then continue to grow. But we do also see a lot of companies coming to us after having thousands of datasets or tens of thousands of datasets that it's really, like, very hard to operate and onboard anyone. And this is a place where we really shine to help their needs, as well.Corey: Sort of the, “I need a database,” to the, “Help, I have too many databases,” pipeline, where [laugh] at some point people start to—wanting to bring organization to the chaos. One thing I like about your model is that you don't seem to be making the play that every other vendor in the data space tends to, which is, “Oh, we want you to move your data onto our systems. The end.” You operate on data that is in place, which makes an awful lot of sense for the kinds of things that we're talking about. Customers are flat out not going to move their data warehouse over to your environment, just because the data gravity is ludicrous. Just the sheer amount of money it would take to egress that data from a cloud provider, for example, is monstrous.Shinji: Exactly. [laugh]. And security concerns. We don't want to be liable for any of the data—and this is, like, a very specific decision we've made very early on the company—to not access data, to not egress any of the real data, and to provide as much value as possible just utilizing the metadata and logs. And depending on the types of data warehouses, it also can be really efficient because the query history or the metadata systems tables are indexed separately. Usually, it's much lighter load on the compute side. And that definitely has, like, worked well for our advantage, especially being a SaaS tool.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: What I like is just how straightforward the integrations are. It's clear you're extraordinarily agnostic as far as where the data itself lives. You integrate with Google's BigQuery, with Amazon Redshift, with Snowflake, and then on the other side of the world with Looker, and Tableau, and other things as well. And one of the example use cases you give is find the upstream table in BigQuery that a Looker dashboard depends on. That's one of those areas where I see something like that, and, oh, I can absolutely see the value of that.I have two or three DynamoDB tables that drive my newsletter publication system that I built—because I have deep-seated emotional problems and I take it out and everyone else via code—but as a small, contained system that I can still fit in my head. Mostly. And I still forget which table is which in some cases. Down the road, especially at scale, “Okay, where is the actual data source that's informing this because it doesn't necessarily match what I'm expecting,” is one of those incredibly valuable bits of insight. It seems like that is something that often gets lost; the provenance of data doesn't seem to work.And ideally, you know, you're staffing a company with reasonably intelligent people who are going to look at the results of something and say, “That does not align with my expectations. I'm going to dig.” As opposed to the, “Oh, yeah, that seems plausible. I'll just go with whatever the computer says.” There's an ocean of nuance between those two, but it's nice to be able to establish the validity of the path that you've gone down in order to set some of these things up.Shinji: Yeah, and this is also super helpful if you're tasked to debug a dashboard or pipeline that you did not build yourself. Maybe the person has left the company, or maybe they're out-of-office, but this dashboard has been broken and you're quote-unquote, “On call,” for data. What are you going to do? You're going to—without a tool that can show you a full lineage, you will have to start digging through somebody else's SQL code and try to map out, like, where the data is coming from, if this is calculating correctly. Usually takes, you know, few hours to just get to the bottom of the issue. And this is one of the main use cases that our customers bring up every single time, as more of, like, this is now the go-to place every time there is any data questions or data issues.Corey: The first and golden rule of cloud economics is step one, turn that shit off.Shinji: [laugh].Corey: When people are using something, you can optimize the hell out of it however you want, but nothing's going to beat turning it off. One challenge is when we're looking at various accounts and we see a Redshift cluster, and it's, “Okay. That thing's costing a few million bucks a year and no one seems to know anything about it.” They keep pointing to other teams, and it turns into this giant, like, finger-pointing exercise where no one seems to have responsibility for it. And very often, our clients will choose not to turn that thing off because on the one hand, if you don't turn it off, you're going to spend a few million bucks a year that you otherwise would not have had to.On the other, if you delete the data warehouse, and it turns out, oh, yeah, that was actually kind of important, now we don't have a company anymore. It's a question of which is the side you want to be wrong on. And in some levels, leaving something as it is and doing something else is always a more defensible answer, just because the first time your cost-saving exercises take out production, you're generally not allowed to save money anymore. This feels like it helps get to that source of truth a heck of a lot more effectively than tracing individual calls and turning into basically data center archaeologists.Shinji: [laugh]. Yeah, for sure. I mean, this is why from the get go, we try to give you all your tables, all of your database, just ordered by popularity. So, you can also see overall, like, from all the tables, whether that's thousands or tens of thousands, you're seeing the most used, has the most number of dependencies on the top, and you can also filter it by all the database tables that hasn't been touched in the last 90 days. And just having this, like, high-level view gives a lot of ideas to the data platform team about how they can optimize usage of their data warehouse.Corey: From where I tend to sit, an awful lot of customers are still relatively early in their data journey. An awful lot of the marketing that I receive from various AWS mailing lists that I found myself on because I've had the temerity to open accounts has been along the lines of oh, data discovery is super important, but first, they presuppose that I've already bought into this idea that oh, every company must be a completely data-driven company. The end. Full stop.And yeah, we're a small bespoke services consultancy. I don't necessarily know that that's the right answer here. But then it takes it one step further and starts to define the idea of data discovery as, ah, you will use it to find a PII or otherwise sensitive or restricted data inside of your datasets so you know exactly where it lives. And sure, okay, that's valuable, but it also feels like a very narrow definition compared to how you view these things.Shinji: Yeah. Basically, the way that we see data discovery is it's starting to become more of an essential capability in order for you to monitor and understand how your data is actually being used internally. It basically gives you the insights around sure, like, what are the duplicated datasets, what are the datasets that have that descriptions or not, what are something that may contain sensitive data, so on and so forth, but that's still around the characteristics of the physical datasets. Whereas I think the part that's really important around data discovery that is not being talked about as much is how the data can actually be used better. So, have it as more of a forward-thinking mechanism and in order for you to actually encourage more people to utilize data or use the data correctly, instead of trying to contain this within just one team is really where I feel like data discovery can help.And in regards to this, the other big part around data discovery is really opening up and having that transparency just within the data team. So, just within the data team, they always feel like they do have that access to the SQL queries and you can just go to GitHub and just look at the database itself, but it's so easy to get lost in the sea of metadata that is just laid out as just the list; there isn't much context around the data itself. And that context and with along with the analytics of the metadata is what we're really trying to provide automatically. So eventually, like, this can be also seen as almost like a way to, like, monitor the datasets, like, how you're currently monitoring your applications through Datadog or your website with your Google Analytics, this is something that can be also used as more of a go-to source of truth around what your state of the data is, how that's defined, and how that's being mapped to different business processes, so that there isn't much confusion around data. Everything can be called the same, but underneath it actually can mean very different things. Does that make sense?Corey: No, it absolutely does. I think that this is part of the challenge in trying to articulate value that is, I guess, specific to this niche across an entire industry. The cont