POPULARITY
Over the past few weeks, Kaiser Fung has given us some valuable pointers on understanding the big data stats we are assaulted with on a daily basis. To sum up, learn the context behind the stats — sources and biases — and know that the algorithms that crunch numbers may not have the answer to your problems. In this third segment of our podcast, Kaiser points out all the ways the stats can trick us through its inherently random nature — variability in stats-speak. Transcript Cindy Ng: In part one and two with our interview with Kaiser Fung, we discussed the process behind a numerical finding, then focused on accuracy. In our last installment, Kaiser reveals one last way to cultivate numbersense. Your third point is to have a nose for doctored statistics. And for me, it's kind of like…if you don't know what you don't know? Kind of like I was surprised to read in the school rankings chapter in Number Sense that different publications have different rules in ranking. And then I didn't know that like reporting low GPAs as not available, it's a magic trick that causes a median GPA to rise. And so if I didn't know this, I would just use any number in any of these publications and use it in my marketing. How do I cultivate a nose for doctored statistics? Kaiser Fung: Well, I think...well, for a lot of people, I think it would involve like reading certain authors, certain people who specializes in this sort of stuff. I'm one of them but there are also others out there who have this sort of skepticism and they will point out how...you know, I mean I think it's all about figuring out how other people do it and then you can do it even to just follow the same types of logic. Often times, it involves sort of like, there are multiple stages to this. So there's the stage of can you smell something fishy? So it's sort of this awareness that, "Okay, do I want to believe this or not?" And then there's the next stage of, do you...once you smell something, do you know where to look, how to look, how do you investigate it? So usually when you smell something that means that you have developed an alternative hypothesis or interpretation that is different from what the thing you're reading. So in sort of this scientific method, what we want to do at that point is to try to go out and find cooperating evidence. So then the question becomes do you have this notion of what kinds of things I could find that could help you decide whether you're right or whether the original person is right? And here the distinction is really around if you're more experienced, you might be able to know if I am able to find this information that will be sufficient for me to even validate this or to fortify that. So you don't necessarily go through the entire analysis. Maybe you just find a shortcut to get to a certain point. And then the last stage is, that's the hardest to achieve and also not always necessary but it's sort of like okay if you no longer believe in what was published, how do you develop your alternative argument? So that requires a little more work and that's the kind of thing that I try to train my students to do. So often times when I set very open-ended type problems for them, you can see these people in different stages. Like there are people who don't recognize where the problems are, you know, just believe what they see. There are people who recognize the problems and able to diagnose what's wrong. Then there are ones that can diagnose what's wrong and they will have...you know, whether it's usually through looking at some other data or some other data points, they can decide, okay, instead of making the assumptions that the original people made which you no longer believe, I'm going to make a different set of assumptions. So like make this other set of assumptions, what would be the logical outcome of the analysis? So I think it's something that can be trained. It's just difficult in the classroom setting in our traditional sort of textbook lecture style. That type of stuff is very difficult to train. Andy Green: Something you said about sort of being able to train ourselves. And one thing that, it comes up in your books a lot, is that a lot of us don't have the sense of variability in the data. We don't understand what that means or what it...if we were to sort of put it out on a bar chart, we don't have that picture in our mind. And one example that you talk about I think on a blog post in something as marketers, we do a lot is A/B testing. And so we'll look at, we'll do a comparison of changing one website slightly and then testing it and then noticing that maybe it does better, we think. And then when we roll it out, we find out it really doesn't make too much of a difference. So you talked about reasons why something might not scale up in an A/B test. I think you wrote about that for one of the blogs. I think it was Harvard Business Review, Kaiser Fung: ...I'm not sure about whether we're saying the same things. I'm not quite exactly remembering what I wrote about there. But from an A/B testing perspective, I think there are lots of little things that people need to pay attention to because ultimately what you're trying to do is to come up with a result that is generalizable, right? So you can run your test in a period of time but in reality, you would like this effect to hold, I mean that you'll find anything over the next period of time. Now, I think both in this case as well as what I just talked about before, one of the core concepts in statistics is not just understanding it's variability. Whatever number is put in front of you, it's just a, at the moment sort of measurement, right? It's sort of like if you measure your weight on the same scale it's going to fluctuate, morning, night, you know different days. But you don't have this notion that your weight has changed. But the actual measurement of the weight, even though if it's still the same weight, will be slightly different. So that's the variability but the next phase is understanding that there are sources of variability. So there are many different reasons why things are variable. And I think that's sort of what we're getting into. So in the case of A/B testing, there are many different reasons why your results have been generalized. One very obvious example is that what we call the, we say that there's a drift in population. Meaning that especially websites, you know, a site changes over time. So even if you keep stable during the test, when you roll it forward it may have changed. And just a small change in the same part of the website could actually have a very large change in the type of people that comes to the page. So I have done...in the past, I've done a lot of A/B testing around kind of what you call the conversion funnel in marketing. And this is particularly an issue if you...let's say you're testing on a page that is close to the end of the funnel. Now, people do that because that's the most impactful place because the conversion rates are much higher in those pages. But the problem is because it's at the end of many steps. Anything that changed in any of the prior steps, it's going to potentially change the types of people ended up on your conversion page. So that's one reason why there are tests that test variability in the type of people coming to your page. Then even if the result worked during a test, it's not going to work later. But there's plenty of other things including something that people often times fail to recognize which is the whole basis of A/B testing is you are randomly placing people into more pockets. And the randomization, it's supposed to on average tell you that they are comparable and the same. But random while it will get you there almost all of the time but you can throw a coin 10 times and get 10 heads. But there's a possibility that there is something odd about that case. So another problem is what is your particular test had this weird phenomenon? Now, in statistics, we account for that by putting error box around these things. But it still doesn't solve the problem that that particular sample was a very odd sample. And so one of the underlying assumptions of all the analysis in statistics is that you're not analyzing that rare sample. That rare sample is kind of treated as part of the outside of normal situation. So yeah, there are a lot of subtlety in how you would actually interpret these things. And so A/B testing is still one of the best ways of measuring something. But even there, there are lots of things that you can't tell. I mean, I also wrote about the fact that sometimes it doesn't tell you...we'd like to say A/B testing gives you cause-effect analysis. It all depends on what you mean by cause-effect because even the most...for a typical example, like the red button and the green button, it's not caused by the color. It's like the color change did not cause anything. So there are some more intricate mechanisms there that if you really want to talk about cause, you wouldn't say color is a cause. Although in a particular way of interpreting this, you can say that the color is the cause. Andy Green: Right, right. Cindy Ng : It really just sounds like at every point you have to ask yourself, is this accurate? Is this the truth? It's a lot more work to get to the truth of the matter. Kaiser Fung: Yes. So I think when people sell you the notion that somehow because of the volume of the data everything becomes easy, I think it's the opposite. I think that's one of the key points of the book. When you have more data, it actually requires a lot more work. And going back to the earlier point which is that when you have more data, the amount of potentially wrong analysis or coming to the wrong conclusion is exponentially larger. And a lot of it is because of the fact that most analysis, especially with data that is not experimental, it's not randomized, not controlled, you essentially you rely on a lot of assumptions. And when you rely a lot on assumptions, it's the proverbial thing about you can basically say whatever the hell you want with this data. And so that's why I think it's really important for people when especially for those people who are not actually in this business of generating analysis, if you're in the business of consuming analysis, you really have to look out for yourself because you really could, in this day and age, could say whatever you want with the data that you have. Cindy Ng: So be a skeptic, be paranoid. Kaiser Fung: Well the nice thing is like when they're only talking about the colors of your bicycles and so on, you can probably just ignore and not do the work because it's not really that important to the problem. But on the other hand, when you...you know, in the other case that is ongoing which is the whole Tesla autopilot algorithm thing, right? Like in those cases and also when people are now getting into healthcare and all these other things where your potential...there's a life and death decision, then you really should pay more attention. Cindy Ng: This is great. Do you have any kind of final thoughts in terms of Numbersense? Kaiser Fung: Well, I'm about...I mean, this is a preview of a blog post that I'm going to put out probably this week. And I don't know if this works for you guys because this could be a bit more involved but so here's the situation. I mean, it's again that basically reinforces the point that you can easily get fooled by the data. So my TA and I were reviewing a data set that one of our students is using for their class projects. And this was basically some data about the revenue contributions of various customers and some characteristics of the customers. So we were basically trying to solve the problem of is there a way to use these characteristics to explain why the revenue contributions for different customers have gone up or down? So we've spent a bit of time thinking about it and we eventually come up with a nice way of doing it. You know, it's not an obvious problem, so we have a nice way of doing it. We thought that actually produced pretty nice results. So then we met with the student and pretty much the first thing that we learned from this conversation is that, oh, because this is for proprietary data, all the revenue members were completely made up. Like there is some, this thing, formula or whatever that she used to generate the number. So that's sort of the interesting sort of dynamic there. Because on the one hand, like obviously all of the work that we spent kind of put in creating this model and then the reason why we like the model is that it creates a nicely interpretable results. Like it actually makes sense, right? But it turns that yes, it makes sense in that imaginary world but it really doesn't have any impact on reality, right? So I think that's the...and then the other side of this which I kind of touch upon in my book too is well, if you were to just look at the methodology of what we did and the model that we built, you would say we did a really good work. Because we applied a good methodology, generate it, quick results. So the method and the data and then your assumptions, I mean all these things play a role in this ecosystem. And I think that...so going back to what I was saying today, I mean it's the problem is all these data. I think we have not spent sufficient time to really think about what are the sources of the data, how believable is this data? And in this day and age, especially with marketing data, with online data and all that, like there's a lot of manipulation going on. There are lots of people who are creating this data for a purpose. Think about online reviews and all other things. So on the analysis side, we have really not faced up to this issue. We just basically take the data and we just analyze and we come up with models and we say things. But how much of any of those things would be refuted if we actually knew how the data was created? Cindy Ng: That's a really good takeaway. You are working on many things, it sounds like. You're working on a blog, you teach. What else are you working on these days? Kaiser Fung: Well, I'm mainly working on various educational activities that are hoping to train the next generation of analysts and people who look at data that will hopefully have...the Numbersense that I want to talk about. I have various book projects in mind which I hope to get to when I have more time. And from the Numbersense perspective, I'm interested in exploring ways to describe this in a more concrete way, right? So there this notion of...I mean, this is a general ecosystem of things that I've talked about. But I want a system that ties it a bit. And so I have an effort ongoing to try to make it more quantifiable. Cindy Ng: And so if people want to follow what you're doing, what is your Twitter handle on your website? Kaiser Fung: Yes, so my Twitter is @junkcharts. And that's probably where most of my, like in terms of updates that's where things go. I have a personal website called just kaiserfung.com where they can learn more about what I do. And then I try to update my speaking schedule there because I do travel around the country, speak at various events. And then they will also read about other things that I do like for corporations that are mostly around, again, training managers, training people in this area of statistical reasoning, data visualization, number sense and all that.
In part oneof our interview with Kaiser, he taught us the importance of looking at the process behind a numerical finding. We continue the conversation by discussing the accuracy of statistics and algorithms. With examples such as shoe recommendations and movie ratings, you’ll learn where algorithms fall short. Transcript Cindy Ng: In part one, Kaiser taught us the importance of looking at the process behind a numerical finding. And today, we’ll to continue in part two on how to cultivate numbersense. Kaiser, do you think algorithms are the answer. And when you’re looking at a numerical finding, how do you know what questions to ask? Kaiser Fung: So I think...I mean, they are obviously a big pile of questions that you ask but I think that the most important question not asked out there is the question of accuracy. And I've always been strucken, I keep mentioning to my blog readers this, is that if you open up any of the articles that are written up, whether the it's the New York Times, Wall Street Journal, you know all these papers have big data articles and they talk about algorithms, they talk about predictive models and so on. But you can never find a quantified statement about the accuracy of these algorithms. They would all qualitatively tell you that they are all amazing and wonderful. And really it all starts with understanding accuracy. And in the Numbersense book, I addressed this with the target example of the tendency models. But also in my previous book, I talk in the whole thing around steroids and also lie detector testing, because it's all kind of the same type of framework. It's really all about understanding the multiple different ways of measuring accuracy. So starting with understanding false positive and false negative. But really they are all derived with other more useful metrics. And you'll be shocked how badly these algorithms are. I mean it's not that...like for a statistical perspective, they are pretty good. I mean, I try to explain to people, too. It's not that we're all kind of snake oil artist that we...these algorithms do not work at all. I mean, usually, they work if you were to compare with not using the algorithm at all. So you actually have incremental improvements and sometimes pretty good improvements over the case of not using an algorithm. Now, however, if the case of not using the algorithm leads to, let's say 10% accuracy, and now we have 30% accuracy, you would be three times better. However, 30% accuracy still means that 70% of the time you got the wrong thing, right? So there's an absolute versus relative measurement here that's important. So once you get into that whole area, it's very fascinating. It's because usually the algorithms also do not really make decisions and they are specific decision rules that are in place because often times the algorithms only calculate a probability of something. So by analogy, the algorithm might tell you that there's a 40% chance of raining tomorrow. But somebody has to create a decision rule that says that, you know, based on...I mean, I'm going to carry umbrella if it's over 60%...So there's all these other stuff involved. And then you have to also understand the soft side of it which is the incentive of the various parties to either go one or the other way. And the algorithm ultimately reflects the designer's because the algorithm will not make that determination of whether you should bring an umbrella since … however, it's over 60% or under 60%. All it can tell you is that for today it's 40%. So I think this notion that the algorithm itself is running on its own, it's false anyway. And then so once you have human input into these algorithms, then you have to also have to wonder about what the humans are doing. And I think in a lot of these books, I try to point out that what also complicates it is that in every case, including the case of Target, there will be different people coming from this in angles where they are trying to optimize objectives that are conflicting. That's the beginning of this...that sort of asking the question of the output. And I think if we start doing that more, we can avoid some of this, I think a very reticent current situation that runs into our conversation here is this whole collapse of this…company. I'm not sure if you guys have been following that. Well, it's an example of somebody who's been solving this algorithm people have been asking. Well, a lot of people have not been asking for quantifiable results. The people have been asking for quantifiable results have been basically pushed back and, you know, they refused all the time to present anything. And then, at this point, I think it's been acknowledged that it's all...you know, empty, it's hot air. Andy Green: Right, yeah. You had some funny comments on, I think it was on your blog about, and this is related to these algorithms, about I guess buying shoes on the web. On, I don't know, one of the website. And you were always saying, "Well," they were coming up with some recommendations for other types of items that they thought you would be interested in. And what you really wanted was to go into the website and at least, when you went to buy the shoe, they would take you right to the shoe size that you ordered in the past or the color that you ordered. Kaiser Fung: Right, right, yes. Andy Green: And it would be that the simple obvious thing to do, instead of trying to come up with an algorithm to figure out what you might like and making suggestions... Kaiser Fung: Yeah. So I think there are many ways to think about that. Part of it is it's that often times the most unsexy problems are the most impactful. But people tend to focus on the most sexy problems. So in that particular case, I mean the whole article was about that the idea is that what makes prediction inaccurate is not just the algorithm being bad...well I mean the algorithms are often times actually, are not bad. It is that the underlying phenomenon that you are predicting is highly variable. So I love to use examples like movies since movie ratings was really big some time ago. So how you rate a movie is not some kind of constant. It depends on the mood, it depends on what you did. It depends on who you are with. It depends on so many things. And you hear the same person in movies and under different settings, you probably gave different ratings. So in that sense, it is very difficult for an algorithm to really predict how you're going to rate the movie. But what I was pointing out is that there are a lot of other types of things that these things could...the algorithms could predict that have essentially, I call invariable nature of property. And a great example of that is the fact that almost always, I mean it's like it's still not a hundred percent but 90% of the time you're buying stuff for yourself, therefore, you have certain shirt sizes, shoe sizes and so on. And therefore it would seem reasonable that they should just show you the things that is appropriate for you. And that's a...it's not a very sexy type of prediction. But it is a kind of prediction. And there are many, many other situations like that, you know. It's like if you just think about just even using an email software, there are certain things that you click on there… it's because the way it's designed is not quite the way you use it. So we have all the data available, they're measuring all this behavior, it could very well be predicted. So I feel like everybody who has done the same with the clicks every time because they're very much like, "Well, I just say what I mean."
In the business world, if we’re looking for actionable insights, many think it's found using an algorithm. However, statistician Kaiser Fung disagrees. With degrees in engineering, statistics, and an MBA from Harvard, Fung believes that both algorithms and humans are needed, as the sum is greater than its individual parts. Moreover, the worldview he suggests one should cultivate is numbersense. How? When presented with a numerical finding, go the extra mile and investigate the methodology, biases, and sources. For more tips, listen to part one of our interview with Kaiser as he uses recent headlines to dissect the problems with how data is analyzed and presented to the general public. Transcript Cindy Ng: Numbersense essentially teaches us how to make simple sense out of complex statistics. However, statistician Kaiser Fung said that cultivating numbersense isn’t something you can learn in a book. But there are three things you can do. First is you shouldn’t take published data as face value. Second, is to know what questions to ask. And third is to have a nose for doctored statistics. And so, the first bullet is you shouldn't take published data at face value. And so like to me, that means it takes more time to get to the truth that matters, to the matter, to the issue at hand. And I'm wondering also like to what extent does the volume of data, big data, affects fidelity because that certainly affects your final result? Kaiser Fung: There are lots of aspects to this. I would say, let's start with the idea that, well it's kind of a hopeless situation because you pretty much have to replicate everything or check everything that somebody has done in order to decide whether you want to believe the work or not. I would say, well, in a way that's true but then over time you develop kind of a shortcut. Then part of it is that if you have done your homework on one type of study, then you could apply all the lessons very easily to a different study that we don't have to actually repeat all that. And also organizations and research groups tend to favors certain types of methodologies. So once you've understood what they are actually doing and what are the assumptions behind the methodologies, then you could...you know, you have developed some idea about whether if you're a believer in the assumptions or their method. Also the time, you know I have certain people who's work I have come to appreciate. I've studied their work, they share some of my own beliefs about how do you read data and how to analyze data. And it's this sense of, it also depends on who is publishing the work. So, I think that's part one of the question is encourage people to not just take what you're told but to really think about what you're being told. So there are some shortcuts to that over time. Going back to your other issue related to the volume of data, I mean I think that is really causing a lot of issues. And it's not just the volume of data but the fact that the data today is not collected with any design or plan in mind. And often times, the people collecting the data is really divorced from any kind of business problem or divorce from the business side of the host. And the data has just been collected and now people are trying to make sense of it. And I think you end up with many challenges. One big challenge is you don't end up solving any problems of interest. So I just had a read up my blog, that will be something just like this weekend. And this is related to somebody's analysis of the...I think this is Tour de France data. And there was this whole thing about, "Well, nowadays we have Garmin and we have all these devices, they're collecting a lot of data about these cyclists. And there's nothing much done in terms of analysis," they say. So which is probably true because again, all of that data has been collected with no particular design in mind or problem in mind. So what do they do? Well, they basically then say, "Well, I'm going to analyze the color of the bike that have actually won the Tour de France over the years." But then that's kind of the state of the world that we're in. We have the data then we try to portrait it by forcing it answer some questions that we’re supposed to create. And often times these questions are actually very silly and doesn't really solve any real problems, like the color of the bike is. I don't think anyone believe it impacts whether you win or not. I mean, that's just an example of the types of problems that we end up solving. And many of them are very trivial. And I think the reason why we are there is that when you just collect the data like that, you know, let's say you have a lot of this data about...I mean, let's assume that this data measures how fast the wheels are turning, the speed of your bike, you know, all that type of stuff. I mean, if the problem is that when you don't have an actual problem in mind, you don't actually have all of the pieces of the data that you need to solve a problem. And most often what you don't have is like an outcome metric. You have a lot of these sort of expensive data but there's no measurement of that thing that you want to impact. And then in order to do that, you have to actually merge in a lot of data or try to collect data from other sources. And you probably often times cannot find appropriate data so you're kind of stuck in this loop of not having any ability to do anything. So I think it's the paradox of the big data age is we have all these data but it is almost impossible to make it useful in a lot of cases. there are many other reasons why the volume of data is not helping us. But I think...what flashed in my head right now because of … is that one of the biggest issues is that the data is not solving any important problems. Andy Green: Kaiser, so getting back to what you said earlier about not sort of accepting what you're told, and I'm also now become a big fan of your blog, Junk Charts. And there was one, I think it's pretty recent, you commented on a New York Times article on CEO executives, CEO pay. And then you actually sort of looked a little deeper into it and you came to sort of an opposite conclusion. In fact, can you just talk about that a little bit because the whole approach there is kind of having to do with Numbersense? Kaiser Fung: Yeah. So basically what happened was there was this big headline about CEO pay. And it was one of these sort of is counter-intuitive headlines that basically said, "Hey, surprise..." Sort of a surprise, CEO pay has dropped. And it even gives a particular percentage and I can't remember what it was in the headline. And I think the sort of Numbersense part of this is that like when I read something like that, because it's sort of like the...for certain topics like this particular topic since I have an MBA and I've been exposed to this type of analysis, so I kind of have some idea, though it's some preconceived notion in my head about where CEO pay is going. And so it kind of triggers a bit of a doubt in my head. So then what you want to do in these cases, and often times, I think this is an example of very simple things you can do, If you just click on the link that is in the article and go to the original article and start reading what they say, and in this particular case, you actually only need to read like literally the first two bullet points of the executive summary of the report. Because then immediately you'll notice that actually CEO pay has actually gone up, not down. And it all depends on what metric people use it. And that they're both actually accurate from a statistic perspective. So, the metric that went up was the median pay. So the middle person. And then the number that went down was the average pay. And then here you basically need a little bit of statistical briefing because you have to realize that CEO pay is an extremely skewed number. Even at the very top, I think they only talk about the top 200 CEOs, even the very top the top person is making something like twice the second person. Like, this is very, very steep curve. So the average is really meaningless in this particular case and the median is really the way to go. And so, you know, I basically blogged about it and say, you know, that that's a really poor choice of a headline because it doesn't represent the real picture of what is actually going on. So that's the story. I mean, that's a great...yes, so that's a great example of what I like to tell people. In order to get to that level of reasoning, you don't really need to take a lot of math classes, you don't need to know calculus, you know...I think it's sort of the misnomer perpetuated by many, many decades of college instruction that statistics is all about math and you have to learn all these formulas in order to go anywhere. Andy Green: Right. Now, I love the explanation. And also, it seems that if the Times had just shown a bar chart and it would have been a little difficult but what you're saying is that at the upper end, there are CEOs making a lot of money and that they just dropped a little bit. And correct me if I'm wrong, but everyone else did better, or most like 80% of the CEOs or whatever the percentile is, did better. But those at the top, because they're making so much, lost a little bit and that sort of dropped the average. But meanwhile, if you polled CEOs, whatever the numbers, 80% or 90% would say, "Yes, my pay has gone up." Kaiser Fung: Right. So yeah. So I did look at the exact numbers there. I don't remember what those numbers are but in conceptually speaking, given this type of distribution, it's possible that just the very top guy having dropped by a bit will be sufficient to make the average move. So the concept that the median is the middle guy has actually moved up. So what that implies is that the bulk, you know, the weight of the distribution has actually gone up. There are many different actual numbers that made this in levels of aspect that you can talk about. That's the first level of getting the idea that you rarely talk in the median. And if you really want to dig deeper, which I did in my blog post, is that you also have to think about what components drive the CEO pay, because if the accounting, not just the fixed-base salary but maybe also bonuses and also maybe they even price in any of the stock components and you know the stock components are going to be much more volatile. I mean it all points to the fact that you really shouldn't be looking at the averages because it's now so affected by all these other ups and downs. So to me, it's a basic level of statistical reasoning that unfortunately hasn't seem to have improved in the journalistic world. I mean, even in this day and age when there's so much data, they really need to improve their ability to draw conclusions. I mean,...that's a pretty simple example of something that can be improved. Now we also have a lot of examples of things that are much more subtle. I'd like to give an example, a different example of this, and it also comes from something that showed up in the New York Times some years ago. But this is a very simple scatter plot that was plotting or trying to explain or trying to correlate the average happiness of people in different countries. And that's typically measured by survey results. So you base your happiness from a scale of zero to ten or stuff like that. And then they want to correlate that with the what they call the progressiveness of the tax system in each of these countries. So,the thing that people don't understand is by making this scatter plot, you have actually imposed upon your reader a particular model of the data. And in this particular case, it is the model that says that happiness can be explained by just one factor which is the tax system. So in reality, they are a gazillion other factors that affects somebody's happiness. And you really...and if you know anything about statistics, we would learn that it multivariable regression which would actually control all the other factors. But when you do a scatter plot, you haven't adjusted for anything else. So it's like the very simple analysis could be extremely misleading.
With one sensational data breach headline after another, we decided to take on the details behind the story because a concentrated focus on the headline tends to reveal only a partial dimension of the truth. For instance, when a bank’s sensitive data is compromised, it depends on how as well as the what. Security practitioner Mike Buckbee said, “It’s very different if your central data storage was taken versus a Dropbox where you let 3rd party vendors upload spreadsheets.” We’re also living in a very different time when everything we do in our personal lives can potentially end up on the internet. However, thanks to the EU’s “right to be forgotten” law, the public made 2.4 million Google takedown requests. Striking the perfect balance will be difficult. How will the world choose between an organization’s goals (to provide access to the world’s information) versus an individual’s right to be forgotten? And when organizations want to confidently make business decisions based on data-driven metrics, trusting data is critical to making the right decision. Our discussion also reminded me what our favorite statistician Kaiser Fung said in a recent interview, “Investigate the process behind a numerical finding.” Other articles discussed: Why I quit Google Balancing security and user productivity 12 best practices for user account, authorization and password management Tool of the week: Bettercap Panelists: Cindy Ng, Kilian Englert, Forrest Temple, Mike Buckbee
In many ways, Avinash is the king of "the side hustle": Speaker & Advisor - An established speaker who has consulted for some of the world’s largest brands Entrepreneur - Recently sold the education company he co-founded for a reported $10M Writer - A thought leader in the world of analytics who wrote the best selling book: “Web Analytics: An Hour A Day” as well as building a large following on his blog: Occam’s Razor Googler - Leads a team of 70 story tellers, as Google’s Digital Marketing evangelist I’ve known Avinash for a few years from our time at Google. But in prep for our interview, I realized how little I knew about his personal story. Who was the real Avinash behind the guy on stage frantically shouting out "THIS SUCKS" to bewildered CEOs? What experiences shaped his outlook and how did he end up in such a unique role at Google? What role did his writing play in shaping where he is today? He hasn’t shared much of this before so I’m thankful we had a chance to get personal & discuss his incredible journey. From living in a factory as a kid in India, to building a life in the US, becoming a best selling author & thought leader in his space. Here are five things to listen out for: (1) Early Years - Growing up poor & living in a factory in India (2) Starting his career - Moving to Saudi Arabia, the US & why he was attracted to web analytics (3) Writing & Building An Audience - How he built an audience of 150k monthly readers on his blog, wrote a book & landed a high profile role at Google (4) Starting From Scratch - If he was starting again, how he would focus his time to build his brand and audience again? (5) Personal - His fear of being broke, what motivates him & what success means to him personally. All show notes and details can be found on: www.creatorlab.fm Time Stamps: Background of how Bilal & Avinash know each other [1m38s] Explaining what his job is to his mum [2m51s] How he got his job at Google [3m44s] Growing up poor in India & living in a factory [5m16s] What he wanted to be when he grew up? [7m38s] Where his work ethic came from [10m31s] Moving to Saudi Arabia & the USA [12m44s] Earning $500/month - still 7x more than he would have made in India [14m16s] What it felt like to be poor [18m10s] How he uses the fear of going broke and losing his job to motivate himself [20m45s] Losing his job again at Silicon Graphics [26m17s] Why it can make sense to take a title & pay cut [28m40] What attracted him to web analytics [29m46s] Taking the plunge to start a company and knowing when it’s the right time to go for it [30m28] Starting a blog and role of personal branding [32m40s] Why he was wrong about people not wanting to pay for a book when they could get it online for free [38m22s] Taking 6months to get to 1000 visitors, now at 150,000 visitors a month [41m15s] Content vs Amplification [43m39s] “If I can listen to 50 podcasts, why should I listen to you” [45m35s] Career limiting moves [51m33s] Owned vs Rented platforms, eg. email vs social [58m39s] Why he gets more engagement from 9.5k email subscribers than 200k twitter followers [1hr47s] Advice for growing an email list [1hr2m14s] What’s motivating him [1hr3m53s] Can you be happy if you always want to stand out? [1hr5m43s] Spending 4 hours a week to learn something new [1hr7m12s] Advice to 18 year old Avinash [1hr8m15s] What success means to him [1hr9m28s] 4 people he mentions: [1hr9m40s] 1) Thomas Baekdal writes about media, analytics, social and broad digital trends: 2) Mitch Joel is a marketing rockstar, his blog illustrates why: 3) Seth Godin’s incisiveness and pithiness, and of course his take on marketing is legendary 4) Kaiser Fung is incredible with big data, advanced math and visualizations Examples of failure [1hr10m38s] What matters most [1hr15m5s] All show notes and other episodes available at: https://www.creatorlab.fm Stay connected: https://www.creatorlab.fm/subscribe https://www.facebook.com/creatorlabfm https://www.instagram.com/creatorlabfm https://www.twitter.com/creatorlabfm Connect with Bilal: https://www.twitter.com/bzaidi https://www.instagram.com/bzaidi212
David and Tamler take a break from their main jobs as TV critics to talk about a masterpiece in political philosophy: "Two Concepts of Liberty" by Isaiah Berlin. While they both celebrate the style and substance of this classic essay, in a startling twist Tamler praises conceptual analysis and David expresses a few misgivings about his Kantianism. What is the elusive idea of positive liberty, and how can its pursuit lead to totalitarian rule? When is it more important to buy boots than read Russian poetry? And why is David still so depressed by pluralism? Plus, coddling in Wisconsin? And another famous set of social psych studies is accused of biting the dust. LinksIn Wisconsin, Efforts to End Taunting at Games Lead to Claims of Coddling By Mike McPhate [nytimes.com]Take my Breath Away by Berlin [youtube.com]Cortex Podcast Episode #20 [relay.fm]Amy Cuddy "Your Body Language Shapes Who You Are" TED Talk [ted.com]"The Power of the Power Pose: Amy Cuddy's Famous Finding is the Latest Example of Scientific Overreach" By Andrew Gelman and Kaiser Fung [slate.com]Berlin, I. (1958) “Two Concepts of Liberty.” In Isaiah Berlin (1969) Four Essays on Liberty.Oxford: Oxford University Press. [verybadwizards.com]Positive and Negative Liberty [plato.stanford.edu]Freedom: Block Distractions
In the weeks before the INFORMS Analytics conference, ComputerWorld columnist Thornton May podcast some hints on how he would interact with his panelists. Hear him after the fireworks on the surprising insights from FICO's Larry Rosenberg, Gartner VP Bill Hostmann, Sirius XM Radio's Kaiser Fung, and Scott Friesen of Best Buy.
Guest: Kaiser Fung Host: Matt Birnholz, MD Host: Michael Greenberg, MD In this rebroadcast of the March 17th show, your hosts talk to Kaiser Fung, author of the book Numbers Rule Your World: The Hidden Influence of Probability and Statistics on Everything You Do, about controversies in PSA screening, mammogram guidelines, personalized medicine and placebos. Plus: Smartphones, primary care salaries, and more.