POPULARITY
Tony Baer, Principal at dbInsight, joins Corey on Screaming in the Cloud to discuss his definition of what is and isn't a database, and the trends he's seeing in the industry. Tony explains why it's important to try and have an outsider's perspective when evaluating new ideas, and the growing awareness of the impact data has on our daily lives. Corey and Tony discuss the importance of working towards true operational simplicity in the cloud, and Tony also shares why explainability in generative AI is so crucial as the technology advances. About TonyTony Baer, the founder and CEO of dbInsight, is a recognized industry expert in extending data management practices, governance, and advanced analytics to address the desire of enterprises to generate meaningful value from data-driven transformation. His combined expertise in both legacy database technologies and emerging cloud and analytics technologies shapes how clients go to market in an industry undergoing significant transformation. During his 10 years as a principal analyst at Ovum, he established successful research practices in the firm's fastest growing categories, including big data, cloud data management, and product lifecycle management. He advised Ovum clients regarding product roadmap, positioning, and messaging and helped them understand how to evolve data management and analytic strategies as the cloud, big data, and AI moved the goal posts. Baer was one of Ovum's most heavily-billed analysts and provided strategic counsel to enterprises spanning the Fortune 100 to fast-growing privately held companies.With the cloud transforming the competitive landscape for database and analytics providers, Baer led deep dive research on the data platform portfolios of AWS, Microsoft Azure, and Google Cloud, and on how cloud transformation changed the roadmaps for incumbents such as Oracle, IBM, SAP, and Teradata. While at Ovum, he originated the term “Fast Data” which has since become synonymous with real-time streaming analytics.Baer's thought leadership and broad market influence in big data and analytics has been formally recognized on numerous occasions. Analytics Insight named him one of the 2019 Top 100 Artificial Intelligence and Big Data Influencers. Previous citations include Onalytica, which named Baer as one of the world's Top 20 thought leaders and influencers on Data Science; Analytics Week, which named him as one of 200 top thought leaders in Big Data and Analytics; and by KDnuggets, which listed Baer as one of the Top 12 top data analytics thought leaders on Twitter. While at Ovum, Baer was Ovum's IT's most visible and publicly quoted analyst, and was cited by Ovum's parent company Informa as Brand Ambassador in 2017. In raw numbers, Baer has 14,000 followers on Twitter, and his ZDnet “Big on Data” posts are read 20,000 – 30,000 times monthly. He is also a frequent speaker at industry conferences such as Strata Data and Spark Summit.Links Referenced:dbInsight: https://dbinsight.io/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at RedHat.As your organization grows, so does the complexity of your IT resources. You need a flexible solution that lets you deploy, manage, and scale workloads throughout your entire ecosystem. The Red Hat Ansible Automation Platform simplifies the management of applications and services across your hybrid infrastructure with one platform. Look for it on the AWS Marketplace.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Back in my early formative years, I was an SRE sysadmin type, and one of the areas I always avoided was databases, or frankly, anything stateful because I am clumsy and unlucky and that's a bad combination to bring within spitting distance of anything that, you know, can't be spun back up intact, like databases. So, as a result, I tend not to spend a lot of time historically living in that world. It's time to expand horizons and think about this a little bit differently. My guest today is Tony Baer, principal at dbInsight. Tony, thank you for joining me.Tony: Oh, Corey, thanks for having me. And by the way, we'll try and basically knock down your primal fear of databases today. That's my mission.Corey: We're going to instill new fears in you. Because I was looking through a lot of your work over the years, and the criticism I have—and always the best place to deliver criticism is massively in public—is that you take a very conservative, stodgy approach to defining a database, whereas I'm on the opposite side of the world. I contain information. You can ask me about it, which we'll call querying. That's right. I'm a database.But I've never yet found myself listed in any of your analyses around various database options. So, what is your definition of databases these days? Where do they start and stop? Tony: Oh, gosh.Corey: Because anything can be a database if you hold it wrong.Tony: [laugh]. I think one of the last things I've ever been called as conservative and stodgy, so this is certainly a way to basically put the thumbtack on my share.Corey: Exactly. I'm trying to normalize my own brand of lunacy, so we'll see how it goes.Tony: Exactly because that's the role I normally play with my clients. So, now the shoe is on the other foot. What I view a database is, is basically a managed collection of data, and it's managed to the point where essentially, a database should be transactional—in other words, when I basically put some data in, I should have some positive information, I should hopefully, depending on the type of database, have some sort of guidelines or schema or model for how I structure the data. So, I mean, database, you know, even though you keep hearing about unstructured data, the fact is—Corey: Schemaless databases and data stores. Yeah, it was all the rage for a few years.Tony: Yeah, except that they all have schemas, just that those schemaless databases just have very variable schema. They're still schema.Corey: A question that I have is you obviously think deeply about these things, which should not come as a surprise to anyone. It's like, “Well, this is where I spend my entire career. Imagine that. I might think about the problem space a little bit.” But you have, to my understanding, never worked with databases in anger yourself. You don't have a history as a DBA or as an engineer—Tony: No.Corey: —but what I find very odd is that unlike a whole bunch of other analysts that I'm not going to name, but people know who I'm talking about regardless, you bring actual insights into this that I find useful and compelling, instead of reverting to the mean of well, I don't actually understand how any of these things work in reality, so I'm just going to believe whoever sounds the most confident when I ask a bunch of people about these things. Are you just asking the right people who also happen to sound confident? But how do you get away from that very common analyst trap?Tony: Well, a couple of things. One is I purposely play the role of outside observer. In other words, like, the idea is that if basically an idea is supposed to stand on its own legs, it has to make sense. If I've been working inside the industry, I might take too many things for granted. And a good example of this goes back, actually, to my early days—actually this goes back to my freshman year in college where I was taking an organic chem course for non-majors, and it was taught as a logic course not as a memorization course.And we were given the option at the end of the term to either, basically, take a final or do a paper. So, of course, me being a writer I thought, I can BS my way through this. But what I found—and this is what fascinated me—is that as long as certain technical terms were defined for me, I found a logic to the way things work. And so, that really informs how I approach databases, how I approach technology today is I look at the logic on how things work. That being said, in order for me to understand that, I need to know twice as much as the next guy in order to be able to speak that because I just don't do this in my sleep.Corey: That goes a big step toward, I guess, addressing a lot of these things, but it also feels like—and maybe this is just me paying closer attention—that the world of databases and data and analytics have really coalesced or emerged in a very different way over the past decade-ish. It used to be, at least from my perspective, that oh, that the actual, all the data we store, that's a storage admin problem. And that was about managing NetApps and SANs and the rest. And then you had the database side of it, which functionally from the storage side of the world was just a big file or series of files that are the backing store for the database. And okay, there's not a lot of cross-communication going on there.Then with the rise of object store, it started being a little bit different. And even the way that everyone is talking about getting meaning from data has really seem to be evolving at an incredibly intense clip lately. Is that an accurate perception, or have I just been asleep at the wheel for a while and finally woke up?Tony: No, I think you're onto something there. And the reason is that, one, data is touching us all around ourselves, and the fact is, I mean, I'm you can see it in the same way that all of a sudden that people know how to spell AI. They may not know what it means, but the thing is, there is an awareness the data that we work with, the data that is about us, it follows us, and with the cloud, this data has—well, I should say not just with the cloud but with smart mobile devices—we'll blame that—we are all each founts of data, and rich founts of data. And people in all walks of life, not just in the industry, are now becoming aware of it and there's a lot of concern about can we have any control, any ownership over the data that should be ours? So, I think that phenomenon has also happened in the enterprise, where essentially where we used to think that the data was the DBAs' issue, it's become the app developers' issue, it's become the business analysts' issue. Because the answers that we get, we're ultimately accountable for. It all comes from the data.Corey: It also feels like there's this idea of databases themselves becoming more contextually aware of the data contained within them. Originally, this used to be in the realm of, “Oh, we know what's been accessed recently and we can tier out where it lives for storage optimization purposes.” Okay, great, but what I'm seeing now almost seems to be a sense of, people like to talk about pouring ML into their database offerings. And I'm not able to tell whether that is something that adds actual value, or if it's marketing-ware.Tony: Okay. First off, let me kind of spill a couple of things. First of all, it's not a question of the database becoming aware. A database is not sentient.Corey: Niether are some engineers, but that's neither here nor there.Tony: That would be true, but then again, I don't want anyone with shotguns lining up at my door after this—Corey: [laugh].Tony: —after this interview is published. But [laugh] more of the point, though, is that I can see a couple roles for machine learning in databases. One is a database itself, the logs, are an incredible font of data, of operational data. And you can look at trends in terms of when this—when the pattern of these logs goes this way, that is likely to happen. So, the thing is that I could very easily say we're already seeing it: machine learning being used to help optimize the operation of databases, if you're Oracle, and say, “Hey, we can have a database that runs itself.”The other side of the coin is being able to run your own machine-learning models in database as opposed to having to go out into a separate cluster and move the data, and that's becoming more and more of a checkbox feature. However, that's going to be for essentially, probably, like, the low-hanging fruit, like the 80/20 rule. It'll be like the 20% of an ana—of relatively rudimentary, you know, let's say, predictive analyses that we can do inside the database. If you're going to be doing something more ambitious, such as a, you know, a large language model, you probably do not want to run that in database itself. So, there's a difference there.Corey: One would hope. I mean, one of the inappropriate uses of technology that I go for all the time is finding ways to—as directed or otherwise—in off-label uses find ways of tricking different services into running containers for me. It's kind of a problem; this is probably why everyone is very grateful I no longer write production code for anyone.But it does seem that there's been an awful lot of noise lately. I'm lazy. I take shortcuts very often, and one of those is that whenever AWS talks about something extensively through multiple marketing cycles, it becomes usually a pretty good indicator that they're on their back foot on that area. And for a long time, they were doing that about data and how it's very important to gather data, it unlocks the key to your business, but it always felt a little hollow-slash-hypocritical to me because you're going to some of the same events that I have that AWS throws on. You notice how you have to fill out the exact same form with a whole bunch of mandatory fields every single time, but there never seems to be anything that gets spat back out to you that demonstrates that any human or system has ever read—Tony: Right.Corey: Any of that? It's basically a, “Do what we say, not what we do,” style of story. And I always found that to be a little bit disingenuous.Tony: I don't want to just harp on AWS here. Of course, we can always talk about the two-pizza box rule and the fact that you have lots of small teams there, but I'd rather generalize this. And I think you really—what you're just describing is been my trip through the healthcare system. I had some sports-related injuries this summer, so I've been through a couple of surgeries to repair sports injuries. And it's amazing that every time you go to the doctor's office, you're filling the same HIPAA information over and over again, even with healthcare systems that use the same electronic health records software. So, it's more a function of that it's not just that the technologies are siloed, it's that the organizations are siloed. That's what you're saying.Corey: That is fair. And I think at some level—I don't know if this is a weird extension of Conway's Law or whatnot—but these things all have different backing stores as far as data goes. And there's a—the hard part, it seems, in a lot of companies once they hit a certain point of maturity is not just getting the data in—because they've already done that to some extent—but it's also then making it actionable and helping various data stores internal to the company reconcile with one another and start surfacing things that are useful. It increasingly feels like it's less of a technology problem and more of a people problem.Tony: It is. I mean, put it this way, I spent a lot of time last year, I burned a lot of brain cells working on data fabrics, which is an idea that's in the idea of the beholder. But the ideal of a data fabric is that it's not the tool that necessarily governs your data or secures your data or moves your data or transforms your data, but it's supposed to be the master orchestrator that brings all that stuff together. And maybe sometime 50 years in the future, we might see that.I think the problem here is both technical and organizational. [unintelligible 00:11:58] a promise, you have all these what we used call island silos. We still call them silos or islands of information. And actually, ironically, even though in the cloud we have technologies where we can integrate this, the cloud has actually exacerbated this issue because there's so many islands of information, you know, coming up, and there's so many different little parts of the organization that have their hands on that. That's also a large part of why there's such a big discussion about, for instance, data mesh last year: everybody is concerned about owning their own little piece of the pie, and there's a lot of question in terms of how do we get some consistency there? How do we all read from the same sheet of music? That's going to be an ongoing problem. You and I are going to get very old before that ever gets solved.Corey: Yeah, there are certain things that I am content to die knowing that they will not get solved. If they ever get solved, I will not live to see it, and there's a certain comfort in that, on some level.Tony: Yeah.Corey: But it feels like this stuff is also getting more and more complicated than it used to be, and terms aren't being used in quite the same way as they once were. Something that a number of companies have been saying for a while now has been that customers overwhelmingly are preferring open-source. Open source is important to them when it comes to their database selection. And I feel like that's a conflation of a couple of things. I've never yet found an ideological, purity-driven customer decision around that sort of thing.What they care about is, are there multiple vendors who can provide this thing so I'm not going to be using a commercially licensed database that can arbitrarily start playing games with seat licenses and wind up distorting my cost structure massively with very little notice. Does that align with your—Tony: Yeah.Corey: Understanding of what people are talking about when they say that, or am I missing something fundamental? Which is again, always possible?Tony: No, I think you're onto something there. Open-source is a whole other can of worms, and I've burned many, many brain cells over this one as well. And today, you're seeing a lot of pieces about the, you know, the—that are basically giving eulogies for open-source. It's—you know, like HashiCorp just finally changed its license and a bunch of others have in the database world. What open-source has meant is been—and I think for practitioners, for DBAs and developers—here's a platform that's been implemented by many different vendors, which means my skills are portable.And so, I think that's really been the key to why, for instance, like, you know, MySQL and especially PostgreSQL have really exploded, you know, in popularity. Especially Postgres, you know, of late. And it's like, you look at Postgres, it's a very unglamorous database. If you're talking about stodgy, it was born to be stodgy because they wanted to be an adult database from the start. They weren't the LAMP stack like MySQL.And the secret of success with Postgres was that it had a very permissive open-source license, which meant that as long as you don't hold University of California at Berkeley, liable, have at it, kids. And so, you see, like, a lot of different flavors of Postgres out there, which means that a lot of customers are attracted to that because if I get up to speed on this Postgres—on one Postgres database, my skills should be transferable, should be portable to another. So, I think that's a lot of what's happening there.Corey: Well, I do want to call that out in particular because when I was coming up in the naughts, the mid-2000s decade, the lingua franca on everything I used was MySQL, or as I insist on mispronouncing it, my-squeal. And lately, on same vein, Postgres-squeal seems to have taken over the entire universe, when it comes to the de facto database of choice. And I'm old and grumpy and learning new things as always challenging, so I don't understand a lot of the ways that thing gets managed from the context coming from where I did before, but what has driven the massive growth of mindshare among the Postgres-squeal set?Tony: Well, I think it's a matter of it's 30 years old and it's—number one, Postgres always positioned itself as an Oracle alternative. And the early years, you know, this is a new database, how are you going to be able to match, at that point, Oracle had about a 15-year headstart on it. And so, it was a gradual climb to respectability. And I have huge respect for Oracle, don't get me wrong on that, but you take a look at Postgres today and they have basically filled in a lot of the blanks.And so, it now is a very cre—in many cases, it's a credible alternative to Oracle. Can it do all the things Oracle can do? No. But for a lot of organizations, it's the 80/20 rule. And so, I think it's more just a matter of, like, Postgres coming of age. And the fact is, as a result of it coming of age, there's a huge marketplace out there and so much choice, and so much opportunity for skills portability. So, it's really one of those things where its time has come.Corey: I think that a lot of my own biases are simply a product of the era in which I learned how a lot of these things work on. I am terrible at Node, for example, but I would be hard-pressed not to suggest JavaScript as the default language that people should pick up if they're just entering tech today. It does front-end, it does back-end—Tony: Sure.Corey: —it even makes fries, apparently. There's a—that is the lingua franca of the modern internet in a bunch of different ways. That doesn't mean I'm any good at it, and it doesn't mean at this stage, I'm likely to improve massively at it, but it is the right move, even if it is inconvenient for me personally.Tony: Right. Right. Put it this way, we've seen—and as I said, I'm not an expert in programming languages, but we've seen a huge profusion of programming languages and frameworks. But the fact is that there's always been a draw towards critical mass. At the turn of the millennium, we thought is between Java and .NET. Little did we know that basically JavaScript—which at that point was just a web scripting language—[laugh] we didn't know that it could work on the server; we thought it was just a client. Who knew?Corey: That's like using something inappropriately as a database. I mean, good heavens.Tony: [laugh]. That would be true. I mean, when I could have, you know, easily just use a spreadsheet or something like that. But so, I mean, who knew? I mean, just like for instance, Java itself was originally conceived for a set-top box. You never know how this stuff is going to turn out. It's the same thing happen with Python. Python was also a web scripting language. Oh, by the way, it happens to be really powerful and flexible for data science. And whoa, you know, now Python is—in terms of data science languages—has become the new SaaS.Corey: It really took over in a bunch of different ways. Before that, Perl was great, and I go, “Why would I use—why write in Python when Perl is available?” It's like, “Okay, you know, how to write Perl, right?” “Yeah.” “Have you ever read anything a month later?” “Oh…” it's very much a write-only language. It is inscrutable after the fact. And Python at least makes that a lot more approachable, which is never a bad thing.Tony: Yeah.Corey: Speaking of what you touched on toward the beginning of this episode, the idea of databases not being sentient, which I equate to being self-aware, you just came out very recently with a report on generative AI and a trip that you wound up taking on this. Which I've read; I love it. In fact, we've both been independently using the phrase [unintelligible 00:19:09] to, “English is the new most common programming language once a lot of this stuff takes off.” But what have you seen? What have you witnessed as far as both the ground truth reality as well as the grandiose statements that companies are making as they trip over themselves trying to position as the forefront leader and all of this thing that didn't really exist five months ago?Tony: Well, what's funny is—and that's a perfect question because if on January 1st you asked “what's going to happen this year?” I don't think any of us would have thought about generative AI or large language models. And I will not identify the vendors, but I did some that had— was on some advanced briefing calls back around the January, February timeframe. They were talking about things like server lists, they were talking about in database machine learning and so on and so forth. They weren't saying anything about generative.And all of a sudden, April, it changed. And it's essentially just another case of the tail wagging the dog. Consumers were flocking to ChatGPT and enterprises had to take notice. And so, what I saw, in the spring was—and I was at a conference from SaaS, I'm [unintelligible 00:20:21] SAP, Oracle, IBM, Mongo, Snowflake, Databricks and others—that they all very quickly changed their tune to talk about generative AI. What we were seeing was for the most part, position statements, but we also saw, I think, the early emphasis was, as you say, it's basically English as the new default programming language or API, so basically, coding assistance, what I'll call conversational query.I don't want to call it natural language query because we had stuff like Tableau Ask Data, which was very robotic. So, we're seeing a lot of that. And we're also seeing a lot of attention towards foundation models because I mean, what organization is going to have the resources of a Google or an open AI to develop their own foundation model? Yes, some of the Wall Street houses might, but I think most of them are just going to say, “Look, let's just use this as a starting point.”I also saw a very big theme for your models with your data. And where I got a hint of that—it was a throwaway LinkedIn post. It was back in, I think like, February, Databricks had announced Dolly, which was kind of an experimental foundation model, just to use with your own data. And I just wrote three lines in a LinkedIn post, it was on Friday afternoon. By Monday, it had 65,000 hits.I've never seen anything—I mean, yes, I had a lot—I used to say ‘data mesh' last year, and it would—but didn't get anywhere near that. So, I mean, that really hit a nerve. And other things that I saw, was the, you know, the starting to look with vector storage and how that was going to be supported was it was going be a new type of database, and hey, let's have AWS come up with, like, an, you know, an [ADF 00:21:41] database here or is this going to be a feature? I think for the most part, it's going to be a feature. And of course, under all this, everybody's just falling in love, falling all over themselves to get in the good graces of Nvidia. In capsule, that's kind of like what I saw.Corey: That feels directionally accurate. And I think databases are a great area to point out one thing that's always been more a little disconcerting for me. The way that I've always viewed databases has been, unless I'm calling a RAND function or something like it and I don't change the underlying data structure, I should be able to run a query twice in a row and receive the same result deterministically both times.Tony: Mm-hm.Corey: Generative AI is effectively non-deterministic for all realistic measures of that term. Yes, I'm sure there's a deterministic reason things are under the hood. I am not smart enough or learned enough to get there. But it just feels like sometimes we're going to give you the answer you think you're going to get, sometimes we're going to give you a different answer. And sometimes, in generative AI space, we're going to be supremely confident and also completely wrong. That feels dangerous to me.Tony: [laugh]. Oh gosh, yes. I mean, I take a look at ChatGPT and to me, the responses are essentially, it's a high school senior coming out with an essay response without any footnotes. It's the exact opposite of an ACID database. The reason why we're very—in the database world, we're very strongly drawn towards ACID is because we want our data to be consistent and to get—if we ask the same query, we're going to get the same answer.And the problem is, is that with generative, you know, based on large language models, computers sounds sentient, but they're not. Large language models are basically just a series of probabilities, and so hopefully those probabilities will line up and you'll get something similar. That to me, kind of scares me quite a bit. And I think as we start to look at implementing this in an enterprise setting, we need to take a look at what kind of guardrails can we put on there. And the thing is, that what this led me to was that missing piece that I saw this spring with generative AI, at least in the data and analytics world, is nobody had a clue in terms of how to extend AI governance to this, how to make these models explainable. And I think that's still—that's a large problem. That's a huge nut that it's going to take the industry a while to crack.Corey: Yeah, but it's incredibly important that it does get cracked.Tony: Oh, gosh, yes.Corey: One last topic that I want to get into. I know you said you don't want to over-index on AWS, which, fair enough. It is where I spend the bulk of my professional time and energy—Tony: [laugh].Corey: Focusing on, but I think this one's fair because it is a microcosm of a broader industry question. And that is, I don't know what the DBA job of the future is going to look like, but increasingly, it feels like it's going to primarily be picking which purpose-built AWS database—or larger [story 00:24:56] purpose database is appropriate for a given workload. Even without my inappropriate misuse of things that are not databases as databases, they are legitimately 15 or 16 different AWS services that they position as database offerings. And it really feels like you're spiraling down a well of analysis paralysis, trying to pick between all these things. Do you think the future looks more like general-purpose databases, or very purpose-built and each one is this beautiful, bespoke unicorn?Tony: [laugh]. Well, this is basically a hit on a theme that I've been—you know, we've been all been thinking about for years. And the thing is, there are arguments to be made for multi-model databases, you know, versus a for-purpose database. That being said, okay, two things. One is that what I've been saying, in general, is that—and I wrote about this way, way back; I actually did a talk at the [unintelligible 00:25:50]; it was a throwaway talk, or [unintelligible 00:25:52] one of those conferences—I threw it together and it's basically looking at the emergence of all these specialized databases.But how I saw, also, there's going to be kind of an overlapping. Not that we're going to come back to Pangea per se, but that, for instance, like, a relational database will be able to support JSON. And Oracle, for instance, does has some fairly brilliant ideas up the sleeve, what they call a JSON duality, which sounds kind of scary, which basically says, “We can store data relationally, but superimpose GraphQL on top of all of this and this is going to look really JSON-y.” So, I think on one hand, you are going to be seeing databases that do overlap. Would I use Oracle for a MongoDB use case? No, but would I use Oracle for a case where I might have some document data? I could certainly see that.The other point, though, and this is really one I want to hammer on here—it's kind of a major concern I've had—is I think the cloud vendors, for all their talk that we give you operational simplicity and agility are making things very complex with its expanding cornucopia of services. And what they need to do—I'm not saying, you know, let's close down the patent office—what I think we do is we need to provide some guided experiences that says, “Tell us the use case. We will now blend these particular services together and this is the package that we would suggest.” I think cloud vendors really need to go back to the drawing board from that standpoint and look at, how do we bring this all together? How would he really simplify the life of the customer?Corey: That is, honestly, I think the biggest challenge that the cloud providers have across the board. There are hundreds of services available at this point from every hyperscaler out there. And some of them are brand new and effectively feel like they're there for three or four different customers and that's about it and others are universal services that most people are probably going to use. And most things fall in between those two extremes, but it becomes such an analysis paralysis moment of trying to figure out what do I do here? What is the golden path?And what that means is that when you start talking to other people and asking their opinion and getting their guidance on how to do something when you get stuck, it's, “Oh, you're using that service? Don't do it. Use this other thing instead.” And if you listen to that, you get midway through every problem for them to start over again because, “Oh, I'm going to pick a different selection of underlying components.” It becomes confusing and complicated, and I think it does customers largely a disservice. What I think we really need, on some level, is a simplified golden path with easy on-ramps and easy off-ramps where, in the absence of a compelling reason, this is what you should be using.Tony: Believe it or not, I think this would be a golden case for machine learning.Corey: [laugh].Tony: No, but submit to us the characteristics of your workload, and here's a recipe that we would propose. Obviously, we can't trust AI to make our decisions for us, but it can provide some guardrails.Corey: “Yeah. Use a graph database. Trust me, it'll be fine.” That's your general purpose—Tony: [laugh].Corey: —approach. Yeah, that'll end well.Tony: [laugh]. I would hope that the AI would basically be trained on a better set of training data to not come out with that conclusion.Corey: One could sure hope.Tony: Yeah, exactly.Corey: I really want to thank you for taking the time to catch up with me around what you're doing. If people want to learn more, where's the best place for them to find you?Tony: My website is dbinsight.io. And on my homepage, I list my latest research. So, you just have to go to the homepage where you can basically click on the links to the latest and greatest. And I will, as I said, after Labor Day, I'll be publishing my take on my generative AI journey from the spring.Corey: And we will, of course, put links to this in the [show notes 00:29:39]. Thank you so much for your time. I appreciate it.Tony: Hey, it's been a pleasure, Corey. Good seeing you again.Corey: Tony Baer, principal at dbInsight. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment that we will eventually stitch together with all those different platforms to create—that's right—a large-scale distributed database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Excellent Executive Coaching: Bringing Your Coaching One Step Closer to Excelling
Kavita Ganesan is the author of The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications and the founder of Opinosis Analytics, an AI advisory and consulting company. What's True, What's Hype, and What's Realistic to Expect from AI in Your Business? How can coaches use AI? Can AI be used to determine the emotional state of an individual? How do corporation use Artificial Intelligence AI? What is the HI-AI discovery Framework? What are the 4 Steps to uncover AI opportunities for your business? Kavita Ganesan Kavita Ganesan is the author of The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications and the founder of Opinosis Analytics, an AI advisory and consulting company. With over 15 years of experience in AI, Kavita holds a master's degree from the University of Southern California, and a Ph.D. from the University of Illinois at Urbana Champaign, with a specialization in Applied AI, NLP, Machine Learning, and Search Technologies. Kavita has been featured in Forbes, CEOWorld, Verizon, CMSWire, KDNuggets, SDTimes, and Yahoo Finance. Excellent Executive Coaching Podcast If you have enjoyed this episode, subscribe to our podcast on iTunes. We would love for you to leave a review. The EEC podcasts are sponsored by MKB Excellent Executive Coaching that helps you get from where you are to where you want to be with customized leadership and coaching development programs. MKB Excellent Executive Coaching offers leadership development programs to generate action, learning, and change that is aligned with your authentic self and values. Transform your dreams into reality and invest in yourself by scheduling a discovery session with Dr. Katrina Burrus, MCC to reach your goals. Your host is Dr. Katrina Burrus, MCC, founder and general manager of www.mkbconseil.ch a company specialized in leadership development and executive coaching.
In this HCI Podcast episode, Dr. Jonathan H. Westover talks with Kavita Ganesan about her book, The Business Case for AI. Kavita Ganesan (https://www.linkedin.com/in/kavita-ganesan/), author of The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications, is an AI advisor, strategist & educator. She works with senior management and teams across the enterprise to help them integrate AI and get results from initiatives. With over 15 years of experience, Kavita has scaled and delivered multiple successful AI initiatives for Fortune 500 companies as well as smaller organizations. She has also helped leaders and practitioners around the world through her blog posts, coaching sessions, and open-source tools. Kavita holds degrees from prestigious computer science programs, specifically a master's degree from the University of Southern California, and a Ph.D. from the University of Illinois at Urbana Champaign, with a specialization in Applied AI, NLP, Search Technologies, and Machine Learning. Kavita has been featured by numerous media outlets including CEOWorld, Forbes, KDNuggets, Authority Magazine, CMSWire, Techopedia, and tED Magazine. Please consider supporting the podcast on Patreon and leaving a review wherever you listen to your podcasts! Check out FindLaw at FindLaw.com. Check out Shopify at www.shopify.com/hci. Check out the HCI Academy: Courses, Micro-Credentials, and Certificates to Upskill and Reskill for the Future of Work! Check out the LinkedIn Alchemizing Human Capital Newsletter. Check out Dr. Westover's book, The Future Leader. Check out Dr. Westover's book, 'Bluer than Indigo' Leadership. Check out Dr. Westover's book, The Alchemy of Truly Remarkable Leadership. Check out the latest issue of the Human Capital Leadership magazine. Each HCI Podcast episode (Program, ID No. 592296) has been approved for 0.50 HR (General) recertification credit hours toward aPHR™, aPHRi™, PHR®, PHRca®, SPHR®, GPHR®, PHRi™ and SPHRi™ recertification through HR Certification Institute® (HRCI®). Each HCI Podcast episode (Program ID: 24-DP529) has been approved for 5.00 HR (General) SHRM Professional Development Credits (PDCs) for SHRM-CP and SHRM-SCPHR recertification through SHRM, as part of the knowledge and competency programs related to the SHRM Body of Applied Skills and Knowledge™ (the SHRM BASK™). Each HCI Podcast episode is also recognized by the Association for Talent Development to offer Recertification Credits and has been approved for 0.50 recertification hours toward APTD® and CPTD® recertification activities. Learn more about your ad choices. Visit megaphone.fm/adchoices
Ever feel like you can't see the forest through the trees? We get it. In this episode, Michael sits down with Maria Zentsova, an ML developer and data analyst who teaches us how to get a handle on our data. “We all know that more data leads to more accuracy, so it's important to get hands on.” - Maria Zentsova In This Episode 1) What KDNuggets is making data science and machine learning easier this year 2) The Do's and Don'ts of data feeds and analysis in 2022 3) Why understanding this ONE issue is the key to avoiding relevance issues and finding your way out of the weeds Sponsors Top End Devs (https://topenddevs.com/) Coaching | Top End Devs (https://topenddevs.com/coaching) Links Build a Serverless News Data Pipeline using ML on AWS Cloud - KDnuggets (https://www.kdnuggets.com/2021/11/build-serverless-news-data-pipeline-ml-aws-cloud.html) Special Guest: Maria Zentsova.
The scientific method consists of systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses. But what does this mean in the context of Data Science, where a wealth of unstructured data and variety of computational models can be used to deduce an insight and inform a stakeholder's decision?In this bite episode we discuss the importance of the scientific method for data scientists. Data science is, after all, the application of scientific techniques and processes to large data sets to obtain impact in a given application area. So we ask how the scientific method can be harnessed efficiently and effectively when there is so much uncertainty in the design and interpretation of an experiment or model. Further Reading and ResourcesPaper: "Defining the scientific method" via Nature https://www.nature.com/articles/nmeth0409-237Paper: "Big data: the end of the scientific method" via The Royal Society https://royalsocietypublishing.org/doi/10.1098/rsta.2018.0145Article: "The Data Scientific Method" via Medium https://towardsdatascience.com/a-data-scientific-method-80caa190dbd4Article: "The scientific method of machine learning" via Datascience.aero https://datascience.aero/scientific-method-machine-learning/Article: "Putting the 'Science' Back in Data Science" via KDnuggets https://www.kdnuggets.com/2017/09/science-data-science.htmlPodcast: "In Our Time: The Scientific Method" via BBC Radio 4 https://www.bbc.co.uk/programmes/b01b1ljmPodcast: "The end of the scientific method" via The Economist https://www.economist.com/podcasts/2019/11/27/the-end-of-the-scientific-methodVideo: "The Scientific Method" via Coursera https://www.coursera.org/lecture/data-science-fundamentals-for-data-analysts/the-scientific-method-Ha5hqCartoon: "Machine Learning" via xkcd https://xkcd.com/1838/Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 30 April 2021Intro music by Music 4 Video Library (Patreon supporter)
Nathan was crucial in implementing cutting edge data infrastructure technologies such as DataRobot, Alteryx, and Snowflake. While at Symphony, Nathan was named a “Data Science Superhero” by DataRobot. Several years ago, Nathan was invited to lead the Healthcare User Group at Alteryx, a role he still holds today and is Core Certified in Alteryx Designer. He is also an AHIMA Certified Health Data Analyst and a HIMSS Certified Professional in Healthcare Information and Management Systems. Nathan shares so much of his knowledge about data analytics. He shares tools his company has found most useful. He always shares great starting points to start using data successfully in your organization. Show Notes:[01:06] Nathan shares how he got into IT and specifically healthcare IT. He has worked in the IT space for almost 21 years from an analytics background.[02:37] He was a programmer for seven years and then he moved into data.[04:19] He loved technology from the beginning, but he loved building it. He likes that you can rapidly build on the data side. [06:52] Since he was more technical and less business savvy he knew he learned more about business and accounting.[08:02] Don’t just look at the data set and think about solving this problem in front of you. Think of it from the business impact it can have. You are solving more than a data problem. You are solving a business problem. [10:01] In 2020, no one can say they don’t have a computer job. [10:42] You have to constantly think about what you are doing with the data and what you know that can help solve problems. [12:01] They researched many tools, and they settled on Ultra Excel. Nathan shares other tools they have found to be a great fit for their organization. [13:12] You have to fit the tool and what it does well with the end-user and always keep the end-user in mind. [15:13] We are going to see people start standing up APIs. [16:53] They set up a lot of their data infrastructure in the cloud and they still have firewalls and security in place. It allows you to move data much quicker.[18:01] They provide internal training for the more difficult problems they encounter. Vendors usually provide baseline training and support. Many vendors also have great online communities that can be a great resource to answer questions.[19:52] KDNuggets is a great resource for analytics. [21:57] Noone will see what you are doing unless you talk about it. It is a balancing act to be humble and also talk about what you are doing well. [22:41] A public speaking class in college is money. You will need to have the strength and courage to talk in large groups. [23:30] Don’t be afraid to put yourself out there and talk about what you’ve done. It may come with criticism. [23:53] Learn to say no. Be judicious in the projects you choose. Think about what things are going to matter to the business and focus on those. Cut out the things that won’t really have an impact. [24:27] You always have to be learning something new. [26:01] Nathan shares his best worst boss story. [27:53] Sometimes people are not talented at all but have a great attitude and they can be trained, taught, and elevated. We can’t teach attitude. [29:52] Most of the problems that people are having is just not understanding the time nature of data. [30:21] On more advanced teams they look at the culture of the team. [31:49] Every company has one metric that matters most. If your company doesn’t know that, then that is the place to start. [33:02] There are opportunities all over the place to find savings. Links and Resources:State of the CIO Podcast WebsiteState of the CIO Podcast on Apple PodcastsDan on LinkedInNathan on LinkedInNathan on YouTubeLean Analytics
In this episode #49, the hosts Naveen Samala & Sudhakar Nagandla have interacted with another guest Rohit. Rohit Gupta is a Technical Trainer, who holds a Master's degree in Informatics, from the University of Delhi, and a Bachelor's degree in Electronic Science, from the University of Delhi. He is also C# Corner MVP (Most Valuable professional), Author, Speaker. Rohit is deeply interested in the field of Artificial Intelligence, Machine Learning, and Python. His aim Is to make World's First Human-Like Humanoid. Rohit believes in the statement “bonding between Machines and Humans will make this world a better place. Listen to Rohit's lessons on: Basics of Computer Vision and its relation to AI, DS & ML How computers can SEE? How image processing is different from Computer Vision? Preprocessing and post processing of Computer Vision Why is it necessary for a machine to see & detect correctly J How can one start learning Computer Vision Commonly used Computer Vision libraries Current limitations & Future of Computer vision Here are the resources shared by Rohit: NPTEL Website: https://nptel.ac.in/ Computer Vision: https://nptel.ac.in/courses/106/105/106105216/ Krish Naik Complete OpenCV: https://bit.ly/3k80jXy YouTube Channel: https://www.youtube.com/user/krishnaik06 OpenCV Downloadable tutorials: https://docs.opencv.org/ PDF Documentation: https://docs.opencv.org/2.4/opencv2refman.pdf Online Tutorials: https://docs.opencv.org/master/d6/d00/tutorial_py_root.html Official OpenCV Courses: https://opencv.org/courses/ Sentdex YouTube Channel: https://www.youtube.com/user/sentdex Computer Vision Playlist: https://bit.ly/31eDwBO Website: https://nnfs.io/ Computer Vision: C# Corner: https://www.c-sharpcorner.com/topics/computer-vision Coursera Nanodegree: https://bit.ly/348k6Ax TEDx: https://bit.ly/3lX0NQD Udacity Free Resources: https://bit.ly/3o2k6Kb Stanford Resources: https://bit.ly/3lYZosR Robofied: https://www.instagram.com/robofied/?hl=en neuralnet.ai: https://www.instagram.com/neuralnet.ai/?hl=en Pytorch: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html PyImageSearch Website: https://www.pyimagesearch.com YouTube Channel: https://bit.ly/3j8jxLa Courses: https://www.pyimagesearch.com/books-and-courses/ OpenCV Tutorials: https://www.pyimagesearch.com/category/opencv/ Machine Learning India LinkedIn: https://www.linkedin.com/company/mlindia/ Website: https://linktr.ee/mlindia Twitter: https://twitter.com/ml_india_ Instagram: https://instagram.com/ml.india Telegram: https://t.me/machinelearning24x7 Rohit Gupta LinkedIn: https://www.linkedin.com/in/rohit-gupta-ai/ Twitter: https://twitter.com/RohitGuptaAI Instagram: https://www.instagram.com/rohitgupta161093/ Facebook: https://www.facebook.com/rohit.gupta.33633 Gmail: gupta.rohitg.rohit900@gmail.com f. Telegram: https://t.me/rohit_gupta_ai C# Corner Profile: https://www.c-sharpcorner.com/members/rohit-gupta95 Enjoy the episode! Do not forget to share your suggestions or feedback at theguidingvoice4u@gmail.com or by messaging at +91 9494 587 187 Subscribe to our YouTube Channel: https://www.youtube.com/c/TheGuidingVoice Also, follow The Guiding Voice on Social Media: LinkedIn: https://www.linkedin.com/company/theguidingvoice Facebook: http://facebook.com/theguidingvoice4u Twitter: http://twitter.com/guidingvoice Instagram: https://www.instagram.com/theguidingvoice4u/ Pinterest: https://in.pinterest.com/theguidingvoice4u/pins/ #artificialIntelligence, #AI, #ML, #DataScience, #AIBasics, #KDNuggets, #MLIndia, #machinelearning, #bigdata, #datascience, #deeplearning, #supervisedlearning, #unsupervisedlearning, #Kaggle, #career, #jobs, #corona, #pandemic, #postpandemic, #careerguidance, #mentorship, #careerpath, #progression, #management, #leadership, #crisis, #job, #midcareer, #youngprofessionals, #careergraph, #TGV, #theguidingvoice
In this episode #41, the hosts Naveen Samala & Sudhakar Nagandla have interacted with another guest Rohit. Rohit Gupta is a Technical Trainer, who holds a Master's degree in Informatics, from the University of Delhi, and a Bachelor's degree in Electronic Science, from the University of Delhi. He is also C# Corner MVP (Most Valuable professional), Author, Speaker. Rohit is deeply interested in the field of Artificial Intelligence, Machine Learning, and Python. His aim Is to make World's First Human-Like Humanoid. Rohit believes in the statement “bonding between Machines and Humans will make this world a better place. Listen to Rohit's Insights on: Basics of Neural Networks and its relation to AI, DS & ML. How Artificial Neural Networks (ANN) work? Types of Artificial Neural Networks Shallow Neural Networks or SNN and Deep Neural Networks or DNN Concept of Bias in ANN ANN project ideas for enthusiasts How ANN is different from Machine Learning? How to start learning ANN? Resources & Books for ANN Here are the resources shared by Rohit: NPTEL YouTube Channel: https://www.youtube.com/user/nptelhrd Website: https://nptel.ac.in/ Neural Network: https://bit.ly/3lLT8EX Krish Naik Complete Deep Learning: https://bit.ly/30RJ6K4 YouTube Channel: https://www.youtube.com/user/krishnaik06 Deep Learning with Keras: https://bit.ly/3lu0soc 3Blue1Brown YouTube Channel: https://bit.ly/2GDKVDT Neural Network Playlist: https://bit.ly/3dfqQiZ Sentdex YouTube Channel: https://www.youtube.com/user/sentdex Deep Learning Basics: https://bit.ly/30UWRrt Website: https://nnfs.io/ Deep Learning: C# Corner: https://www.c-sharpcorner.com/article/deep-learning/ Kaggle: https://www.kaggle.com/learn/deep-learning Coursera Nanodegree: https://www.udacity.com/course/deep-learning-nanodegree--nd101 TEDx: https://www.ted.com/search?q=deep+learning Andrew NG Course: https://www.coursera.org/learn/machine-learning Python Implementation of Andrew NG Course: https://bit.ly/3gNOofl Deeplearning.ai : https://www.deeplearning.ai/ Optimizing a Neural Network: https://www.c-sharpcorner.com/article/how-to-optimize-a-neural-network/ Machine Learning India LinkedIn: https://www.linkedin.com/company/mlindia/ Website: https://linktr.ee/mlindia Twitter: https://twitter.com/ml_india_ Instagram: https://instagram.com/ml.india Telegram: https://t.me/machinelearning24x7 Rohit Gupta LinkedIn: https://www.linkedin.com/in/rohit-gupta-ai/ Twitter: https://twitter.com/RohitGuptaAI Instagram: https://www.instagram.com/rohitgupta161093/ Facebook: https://www.facebook.com/rohit.gupta.33633 Gmail: gupta.rohitg.rohit900@gmail.com Telegram: https://t.me/rohit_gupta_ai C# Corner Profile: https://www.c-sharpcorner.com/members/rohit-gupta95 Enjoy the episode! Do not forget to share your suggestions or feedback at theguidingvoice4u@gmail.com or by messaging at +91 9494 587 187 Subscribe to our YouTube Channel: https://www.youtube.com/c/TheGuidingVoice Also, follow The Guiding Voice on Social Media: LinkedIn: https://www.linkedin.com/company/theguidingvoice Facebook: http://facebook.com/theguidingvoice4u Twitter: http://twitter.com/guidingvoice Instagram: https://www.instagram.com/theguidingvoice4u/ Pinterest: https://in.pinterest.com/theguidingvoice4u/pins/ #neuralnetworks, #NN, #artificialneuralnetworks, #ANN, #SNN, #DNN, #RNN #artificialIntelligence, #AI, #ML, #DataScience, #AIBasics, #KDNuggets, #MLIndia, #machinelearning, #bigdata, #datascience, #deeplearning, #supervisedlearning, #unsupervisedlearning, #Kaggle, #career, #jobs, #corona, #pandemic, #postpandemic, #careerguidance, #mentorship, #careerpath, #progression, #management, #leadership, #crisis, #job, #midcareer, #youngprofessionals, #careergraph, #TGV, #theguidingvoice
In this episode #22, the hosts Naveen Samala & Sudhakar Nagandla have interacted with another guest Rohit. Rohit Gupta is a Technical Trainer, who holds a Master's degree in Informatics, from the University of Delhi, and a Bachelor's degree in Electronic Science, from the University of Delhi. He is also C# Corner MVP (Most Valuable professional), Author, Speaker. Rohit is deeply interested in the field of Artificial Intelligence, Machine Learning, and Python. His aim Is to make World's First Human-Like Humanoid. Rohit believes in the statement “bonding between Machines and Humans will make this world a better place. Listen to Rohit's Insights on: Artificial Intelligence (AI) definition for a 5th grader Advantages of AI Domains or Businesses that can benefit from AI? Biggest Players of AI in the Industry and what we can learn from them? Is AI an existential threat to humanity? Does it impact job market? Career Path in AI and where should one begin from? Advise to Startup founders from India in AI domain Are there any pre requisites for learning AI Difference between AI & ML Roles in AI & How they evolve? Future of AI Here are the resources shared by Rohit: Machine Learning: Basic to Intermediate: C# Corner: https://www.c-sharpcorner.com/members/rohit-gupta95/articles/machine-learning Kaggle: https://www.kaggle.com/learn/intro-to-machine-learning Andrew NG Course: https://www.coursera.org/learn/machine-learning Python Implementation of Andrew NG Course: https://bit.ly/3gNOofl Coursera Nanodegree: https://www.udacity.com/course/machine-learning-engineer-nanodegree--nd009t TEDx: https://www.ted.com/search?q=machine+learning What is Artificial Intelligence: C# Corner: https://www.c-sharpcorner.com/article/a-complete-artificial-intelligence-tutorial/ EDx: https://www.edx.org/learn/artificial-intelligence Coursera Nanodegree: https://www.udacity.com/course/ai-artificial-intelligence-nanodegree--nd898 TEDx: https://www.ted.com/search?q=artificial+intelligence Introduction to Python: C# Corner: https://www.c-sharpcorner.com/article/python-basics/ Kaggle: https://www.kaggle.com/learn/python Udemy Course: https://bit.ly/3elEPTg Coursera Nanodegree: https://www.udacity.com/course/introduction-to-python--ud1110 Data Science: C# Corner: https://www.c-sharpcorner.com/article/data-science/ KDNuggets: https://www.kdnuggets.com/ EDx: https://www.edx.org/learn/data-analysis Coursera Nanodegree: https://www.udacity.com/course/data-scientist-nanodegree--nd025 Coursera Nanodegree: https://www.udacity.com/course/data-scientist-nanodegree--nd027 Deep Learning: C# Corner: https://www.c-sharpcorner.com/article/deep-learning/ Kaggle: https://www.kaggle.com/learn/deep-learning Coursera Nanodegree: https://www.udacity.com/course/deep-learning-nanodegree--nd101 TEDx: https://www.ted.com/search?q=deep+learning Machine Learning India LinkedIn: https://www.linkedin.com/company/mlindia/ Website: https://linktr.ee/mlindia Twitter: https://twitter.com/ml_india_ Instagram: https://instagram.com/ml.india Telegram: https://t.me/machinelearning24x7 Rohit Gupta LinkedIn: https://www.linkedin.com/in/rohit-gupta-ai/ Twitter: https://twitter.com/RohitGuptaAI Instagram: https://www.instagram.com/rohitgupta161093/ Facebook: https://www.facebook.com/rohit.gupta.33633 Gmail: gupta.rohitg.rohit900@gmail.com Telegram: https://t.me/rohit_gupta_ai C# Corner Profile: https://www.c-sharpcorner.com/members/rohit-gupta95 Enjoy the episode! Do not forget to share your suggestions or feedback at theguidingvoice4u@gmail.com or by messaging at +91 9494 587 187 Subscribe to our YouTube Channel: https://www.youtube.com/c/TheGuidingVoice Also, follow The Guiding Voice on Social Media: LinkedIn: https://www.linkedin.com/company/theguidingvoice Facebook: http://facebook.com/theguidingvoice4u Twitter: http://twitter.com/guidingvoice Instagram: https://www.instagram.com/theguidingvoice4u/ Pinterest: https://in.pinterest.com/theguidingvoice4u/pins/ #artificialIntelligence #AI #ML #DataScience #AIBasics #KDNuggets #MLIndia #machinelearning #bigdata #datascience #deeplearning #supervisedlearning #unsupervisedlearning
Python versus R. It's a heated debate. We won't solve this raging controversy today, but we will peek into the history of Python, particularly in the open source community surrounding it, and see how it came to be what it is today—a well used and flexible programming language. Travis Oliphant: Wes McKinney did a great job in creating Pandas . . . not just creating it but organized a community around it, which are two independent steps and both necessary, by the way. A lot of people get confused by open source. They sometimes think you just kind of going to get people together and open source emerges from the foam, but what ends up happening, I’ve seen this now at least eight, nine different times, both with projects I’ve had a chance and privilege to interact with, but also other people's projects. It really takes a core set of motivated people, usually not more than three. Ginette: I’m Ginette. Curtis: And I’m Curtis. Ginette: And you are listening to Data Crunch. Curtis: A podcast about how applied data science, machine learning, and artificial intelligence are changing the world. Ginette: A Vault Analytics production. Ginette: This episode of Data Crunch is supported by Lightpost Analytics, a company helping bridge the last mile of AI: making data and algorithms understandable and actionable for a non-technical person, like the CEO of your company. Lightpost Analytics is offering a training academy to teach you Tableau, an industry-leading data visualization software. According to Indeed.com, the average salary for a Tableau Developer is above $50 per hour. If done well, making data understandable can create breakthroughs in your company and lead to recognition and promotions in your job. Go to lightpostanalytics.com/datacrunch to learn more and get some freebies. Here at Data Crunch, we love playing with artificial intelligence, machine learning, and deep learning, so we started a fun new side project. We just launched a new podcast that tests the boundaries of what can be done with Google’s cutting-edge deep learning speech generation algorithms. We use surprisingly human-like voices to host the podcast that reads all the unusual Wikipedia articles you haven’t had a chance to read yet, like chicken hypnosis, the history of an amusing German conspiracy theory, strange trends in Russian politics, and much more to come. It’s worth listening to to hear what this tech sounds like and you’ll learn unique and bizarre trivia that you can share at your next dinner party. Search for a podcast called “Griswold the AI Reads Unusual Wikipedia Articles,” now found on all your favorite popular podcast platforms. Curtis: There has been a heated, ongoing debate about which programming language is better when working with machine learning and data analytics: Python or R, and while we won’t be wresting that particular question, we will overview a bit of history for both and then dive into significant history behind one of these languages, Python, with a major contributor to the language, a man who significantly influenced the way that data scientists use Python today. Ginette: As a very short historical background, Python came to the scene in 1991 when Guido Van Rossem developed it. His language has developed a reputation as easy to use because it’s syntax is simple, it’s versatile, and it has a shallow learning curve. It’s also a general purpose language that is used beyond data analysis and great for implementing algorithms for production use. As for R, it followed shortly after Python. In 1995, Ross Ihaka and Robert Gentleman created it as an easier way to do data analysis, statistics, and graphic models, and it was mainly used in academia and research until more recently. It’s specifically aimed at statistics, and it has extensive libraries and a solid community. As a controversial side note, according to Gregory Piatetsky Shapiro’s KDNuggets poll, late last year,
In this episode of the SuperDataScience Podcast, I chat with the President and Editor at KDNuggets, Gregory Piatetsky-Shapiro. You will talk about the recent advancements in Data Science, know how Reinforcement Learning can greatly improve the AI capabilities and discuss the noticeable growing trend of automation in the field of data science. If you enjoyed this episode, check out show notes, resources, and more at www.superdatascience.com/175
Gregory Piatetsky-Shapiro, PHD, is the President of KDnuggets, a leading site for Analytics, Big Data, Data Science, and Machine Learning. Gregory is a co-founder of KDD (Knowledge Discovery and Data mining conferences), and a top research conference in the field... He is also a co-founder and past chair of ACM SIGKDD, a professional association for Data Mining and Data Science, and a well-known Data Scientist. Interviewer: Rajib Bahar, Shabnam Khan Agenda: RB - According to Forbes, you're one of the top Big Data Influencers...Your KDNuggets site has well over 60 awards and mentions as a leading publication. I often find myself reading and tweeting your articles from KDNuggets. There are few places to find exciting articles on Data Science, AI, BigData... Please tell us a brief history of KDNuggets... SB - I understand you have transitioned from researcher role to a high level editor role. What do you enjoy about it? RB - What are your thoughts on Global trend on Machine Learning, AI, & Big Data? SB - Where does automation of Data Science come into play? Is that a helpful process or distraction from useful analysis? How do you implement it? RB - We all have bias. Does Big Data suffer from any bias such as implicit bias? SB - Who are these so called Citizen Data Scientists? Why are they important? How can they serve society at large? RB - How do we connect with you on twitter & social media? Additional Reference Materials from Gregory for our listeners. Data Science Automation: Data Scientists Automated and Unemployed by 2025? http://www.kdnuggets.com/2015/05/data-scientists-automated-2025.html The Current State of Automated Machine Learning http://www.kdnuggets.com/2017/01/current-state-automated-machine-learning.html Trends: Machine Learning overtaking Big Data http://www.kdnuggets.com/2017/05/machine-learning-overtaking-big-data.html Optimism about AI improving society is high, but drops with experience developing AI systems http://www.kdnuggets.com/2017/07/optimism-ai-impact-experience.html Bias in Big Data http://www.sciencemag.org/news/2017/04/even-artificial-intelligence-can-acquire-biases-against-race-and-gender https://www.theguardian.com/technology/2017/apr/13/ai-programs-exhibit-racist-and-sexist-biases-research-reveals https://www.forbes.com/sites/mariyayao/2017/05/01/dangers-algorithmic-bias-homogenous-thinking-ai/#42d724d270b3 Mirage of a Citizen Data Scientist: http://www.kdnuggets.com/2016/03/mirage-citizen-data-scientist.html Citizen Data Scientist Cartoon: http://www.kdnuggets.com/2016/03/cartoon-citizen-data-scientist.html Overfitting The Cardinal Sin of Data Mining and Data Science: Overfitting http://www.kdnuggets.com/2014/06/cardinal-sin-data-mining-data-science.html Music: www.freesfx.co.uk
Ruben is head of data at VSCO, a creative platform and a community of expression. His focus is on designing the data strategy, understanding user behavior, and designing metrics for the entire organization. Prior to VSCO, Ruben was the head of Content Analytics at Udemy. And before that he had a completely different career developing and troubleshooting chip manufacturing technology. Ruben’s interests span statistics, machine learning, economics, and photography. Interviewer: Rajib Bahar, Shabnam Khan Agenda: - In your role, You're in charge of Data Governance at VSCO... Would you like to elaborate on that? - According to Gregory's article in KDNuggets, top 3 most heavily used Data Science algorithms were Regression, Clustering, Decision Trees. In your projects, how useful were they? - What are some of your favorite KDNuggets like online resources, where you find good code samples, articles, podcasts etc? - Are you applying evolving branches of Artificial Intelligence such as Machine Learning, Deep Learning, or Reinforced Learning in your organization? - Many people are aware of Instagram... How is VSCO more unique compared to that platform? - Now, on to some art related question... Are you a portrait, landscape, or travel photographer? Or do you experiment with different techniques? - Please tell us where we can find you in social media? Music: www.freesfx.co.uk
How did one boy's stuffed yellow elephant permanently intertwine itself in history? What is a data scientist? Why is right now the golden age for data science? We take a crack at all three of these questions—the second two, with the help of Gregory Piatetsky-Shapiro and Ryan Henning. Transcript Ginette: “Over the past few years, we’ve seen these news flashes: “An article in Harvard Business Review in 2014, titled: Data Scientist: the Sexiest Job of the 21st Century “Mashable’s article in 2015: So You Wanna Be a Data Scientist? A Guide to 2015’s Hottest Profession “Business Insider, 2016: Data Science was the #1 Profession as Rated by Glassdoor “A data science industry observer, KDnuggets, 2017: Data Scientist: Best Job in America, Again, which cites the most recent Glassdoor report outlining the very top jobs in America: “It turns out, four of the five top US jobs deal with data. In descending order, we find data scientist, devops engineer, data engineer, and analytics manager.” Curtis: “With four out of five of these top jobs orbiting data, clearly something’s going on here.” Ginette: “I’m Ginette.” Curtis: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “Today is a culmination of everything we’ve talked about in our series on the history of data science. This is where all the contributions of Florence Nightingale, William Playfair, Ronald Fisher, Ada Lovelace, and many others come together in one place. We’ll add a couple more people to this list to answer these two questions: ‘What is a data scientist? And why is right now the golden age of data science?’” Curtis: “According to IBM, ‘everyday, we create 2.5 quintillion bytes of data.’ But what does a quintillion actually look like? “Well, if you take one quintillion pennies, you could actually place them face up end to end can and blanket the entire surface of the earth 1.5 times over. Or think about one quintillion ants. That would be like taking all of the ants that exist today on planet earth according to some estimates, and then you have to take that number and multiply it by 100. So, that ant pile in your front yard becomes 100 ant piles in your front yard. Basically ants take over the earth. And we make 2.5 quintillion bytes every single day! “The next question is, how much information does that actually represent? It’s 250,000 times the amount of information that all the printed material in the Library of Congress contains. And we make that every single day.” Ginette: “In 2013, SINTEF published this stat, quote: ‘90% of the world’s data has been created in the preceding two years.’ According to one Ph.D. technologist, this has been true for the last 30 years because every two years, we produce 10 times as much data.” Curtis: “This exponential growth is insane. Just as an example of this type of growth rate, if you take a hypothetical scenario, and you take the world’s population, and say it starts growing as rapidly as data is growing now, it would look like this: Currently, the world’s population, 7 billion people, could fit in the size of Texas if they were living as densely as they do in New York City. Now, in two year’s time with this growth rate, you’d actually have to cover the entire United States and half of Canada with people living in New York City-like density. And if you extrapolate that out ten years keeping the growth rate the same, you’d have to cover the entire planet, including all of the oceans, with New York Cities, and then you’d have to do that with 100–150 additional earths to fit all of those people. That’s the kind of growth rate we’re talking about.” Ginette: “With data collection on the rise, one report goes so far as to say that only the data literate will have the chops to be executives in the future, quote:
The history books teach that slavery ended, but it still exists; it’s just morphed its form—different commodity, different location, but same abuses. The commodity is seafood. The location, Southeast Asia. The abuses, forced servitude with all its ugly associations. Some people make a substantial living off illegal, unregulated, and unreported (IUU) fishing, which fuels a dark underground. How is big data angling to stop it? Find out in our next two episodes. Transcript: Michele Kuruc: “People who were seeking better lives and, and coming to look for work were kidnapped by unscrupulous dealers, who forced them into lives we can’t even imagine.” Ginette Methot: “I’m Ginette.” Curtis Seare: “And I’m Curtis.” Ginette: “And you are listening to Data Crunch.” Curtis: “A podcast about how data and prediction shape our world.” Ginette: “A Vault Analytics production.” Ginette: “Welcome back to Data Crunch! We took a bit of a break over the holidays, and we hope you were able to too. “So upward and onward to 2017. What are we up to this year? We’ll be finishing our data science history miniseries for you, and we’ll be meeting some really cool people from KDnuggets, Galvanize Austin, and Datascope in Chicago. But before we do those episodes, we have to pivot because with major recent developments, this particular episode deserves to come out now. “The lives we can’t even imagine look like this according to the Associated Press. One Burmese man left his village when he was 18 years old. He followed a recruiter who promised him a construction job. When he arrived in Thailand, his captors held him with little food or water for a month. He was then forced onto a fishing boat. He was told that he was sold and would never be rescued. In that fishing environment, sometimes he worked 24-hours a day. He and his fellow fishers were whipped with stingray tails and shocked with electric devices. They were told during their time fishing that they would never be let go, not even when they died, and men in his similar situation were sometimes sold from ship captain to ship captain. “If they tried to escape the work, they were locked in cages on remote islands. In the 22 years he was away from home, he asked to go home twice. The first time he asked, the company official chucked a helmet at his head, which left a bloody gash that he had to hold closed. The second time he begged to go home, he was chained to the boat deck for three days in the blistering sun and when the night came, it was rainy, and he could do little to protect himself from it. During that three-day period, he had no food. He amazingly fashioned a lock pick and unlocked his shackles. He knew if he was caught, he’d be killed, so he dove into the water in the cover of night and swam ashore, hiding for his life. “You might ask why he didn’t go to local officials. The answer is he couldn’t because they might sell him back to the ship captains. So after eight years in the jungle hiding from the fishing companies, he finally got to go home because of the AP’s reporting. This is modern-day slavery. Every year, thousands of people are tricked or sold into this type of slavery in order to catch fish for lucrative markets. “If you’ve ever read Solomon Northup’s gripping autobiography, Twelve Years a Slave, the similarity is eery. They are both free men who are initially unknowingly abducted. They’re shackled, beaten into servitude, and forced to work in harsh conditions for many, many years. Both are desperate to go home to their families, and both experience miraculous escapes from tyrannical systems. But unfortunately, not everyone escapes. “This is a huge problem, and it’s frequently linked to illegal, unregulated, and unreported fishing, well known as IUU fishing. Unfortunately, IUU fishing is linked to some of the ugliest transnational crimes: modern-day slavery, human trafficking, drug trafficking,
What is artificial intelligence? What is artifical intelligence? And great answers to most of what we talked about, by a proper computer guy from Stanford University (Formal Reasoning Group) What is Skynet? (Wikia) What is computer chess? (Wikipedia) Google computer wins final game against South Korean Go master (Physics.org) Google has gotten very good at predicting traffic (Tech Insider) When will AI be created? (Machine Intelligence Research Institute) What is intelligence? (Machine Intelligence Research Institute) What is consciousness? (big think) What it will take for computers to be conscious (MIT Technology Review) Learning how little we know about the brain (The New York Times) Google traffic (Google) What is artifical consciousness? (Wikipedia) Kegan's 'orders of mind' (NZCR) Kegan's theory of the evolution of consciousness (Stanford University) Consciousness may be an 'emergent property' of the brain (Quora) A good discussion between Sam Harris and Neil deGrasse Tyson (Sam Harris' podcast) There are billions of connections in your brain (The Astronomist) The 'Go' game (Wikipedia) The number of possible Go games is reeeeally large...potentially more than the number of atoms in the universe (Sensei's Library) A comparison of chess & Go (British Go Association) Go & maths...the number of positions is scary (Wikipedia) There are also a lot of chess moves (Chess.com) What is a brute force attack? (Technopedia) How many moves ahead can hard core chess players see? (Quora) Deep learning in a nutshell – what it is, how it works, why care? (KDnuggets) Deep learning with massive amounts of computational power, machines can now recognize objects & translate speech in real time (MIT Technology Review) Google's 'DeepMind' deep learning start up (techworld) The Google Brain project (Wired) The Go computer was trained with 160,000 real-life games (Scientific American) Evolutionary computation & AI (Wikipedia) Genetic programming & AI (Wikipedia) So what's a robot then? (Galileo Educational Network) Professor reveals to students that his assistant was an AI all along (SMH) Hate Siri? Meet Viv - the future of chatbots and artificial intelligence (SMH) What is the connection between AI & robotics (wiseGEEK) Robotic limbs that plug into the brain (MIT Technology review) The Roomba vacuum robot (iRobot) The 'Robot or Not' podcast (The Incomparable) Expert predictions on when we'll see conscious machines: When will the machines wake up? (TechCrunch) Google AI: What if Google became self-aware? (wattpad) Will Google create the first conscious computer? (Daily Mail Australia) Google Consciousness...not affiliated with Google (Google Consciousness) Elon Musk does indeed have an AI company: Open AI (Wired) Evil genius with a fluffy cat (Regmedia) The Maltesers gift box (Mars) Will machines eventually take on every job? (BBC) When robots take all the work, what'll be left for us to do? (Wired) The travelling salesman maths problem (Wikipedia) A bunch of stuff about the travelling salesman maths problem (University of Waterloo) GPS became fully operational in 1995, but was proposed in 1973 (Wikipedia) How does GPS work? (Wikipedia) Digital diagnosis: intelligent machines do a better job than humans (The Conversation) What is Lyme disease? (Lyme Disease Association of Australia) Stuttgart (Wikipedia) Robot B-9: the robot from Lost in Space (Lost in Space Wiki) Robot B-9 in action (YouTube) Surgical robots (All About Robotic Surgery) Commercial planes are basically just big drones (Esquire) The AI in Google's self-driving cars qualifies as legal driver (Fortune) All the self-driving cars are learning from each other (The Oatmeal) Your future self-driving car will be way more hackable (MIT Technology Review) Google self-driving cars have driven more than 2 million km & have ony had 14 minor collisions (Wikipedia) Crazy animation of self-driving cars at an intersection (Co.Design) Self-driving cars could get their own lanes (wtop) Self-driving cars could lower insurance premiums (The Telegraph) Self-driving cars could lower insurance premiums (Wired) Australia's new National Broadband Network (nbnco) Tesla's cars now drive themselves, kinda (Wired) Australia's first autonomous vehicle test (Motoring) How AI is driving the next industrial revolution (InformationAge) Why bots are the next industrial revolution (Huffington Post) Humans need not apply: short video (C.G.P. Grey) Self-driving trucks are on the way (Basic Income) In the 2015 census there were 94,975 articulated trucks registered in Australia (Australian Bureau of Statistics) Driverless trucks move all iron ore at Rio Tinto's Pilbara mines, in world first (ABC Australia) Rio Tinto pushes ahead with driverless trains in Pilbara (SMH) Can Star Trek's world with no money work? (CNN Money) The economics of Star Trek (Medium) Jeff Bezos from Amazon (Wikipedia) Yes, the robots will steal our jobs. And that's fine (The Washington Post) I fear 'low-cost country sourcing' more than robots taking my job (Wikipedia) Flight prices are calculated by robots doing maths (Mathematical Association of America) Are airline passengers getting ripped off by robots? (Fortune) Is it true that once you search for a flight the algorithm will remember & put the price up? (Quora) Mac users may see pricier options (ABC America) Naked Wines Sir James Dyson (Encyclopaedia Britannica) Cheeky review? (If we may be so bold) It'd be amazing if you gave us a short review...it'll make us easier to find in iTunes: Click here for instructions. You're the best! We owe you a free hug and/or a glass of wine from our cellar