POPULARITY
Measuring your talk time? Counting your filler words? What about "analyzing" your "emotions"? Companies that push LLM technology to surveil and summarize video meetings are increasingly offering to (purportedly) analyze your participation and assign your speech some metrics, all in the name of "productivity". Sociolinguist Nicole Holliday joins Alex and Emily to take apart claims about these "AI" meeting feedback tools, and reveal them to be just sparkling bossware, with little insight into how we talk.Nicole Holliday is Acting Associate Professor of Linguistics at the University of California-Berkeley.Quick note: Our guest for this episode had some sound equipment issues, which unfortunately affected her audio quality.Main course:Read AI Review: This AI Reads Emotions During Video CallsMarketing video for Read AIZoom rebrands existing and introduces new generative AI featuresMarketing video for Zoom Revenue AcceleratorSpeech analysis startup releases AI tool that simulates difficult job interview conversationFresh AI Hell:Amazon Echo will send all recordings to Amazon beginning March 28Trump's NIST no longer concerned with “safety” or “fairness”Reporter Kevin Roose is feeling the bullshitUW's eScience institute pushing “AI” for information accessOpenAI whines about data being too expensive, with a side of SinophobiaCheck out future streams at on Twitch, Meanwhile, send us any AI Hell you see.Our book, 'The AI Con,' comes out in May! Pre-order now.Subscribe to our newsletter via Buttondown. Follow us!Emily Bluesky: emilymbender.bsky.social Mastodon: dair-community.social/@EmilyMBender Alex Bluesky: alexhanna.bsky.social Mastodon: dair-community.social/@alex Twitter: @alexhanna Music by Toby Menon.Artwork by Naomi Pleasure-Park. Production by Christie Taylor.
Send us a textEnglish Edition: https://www.esciencecenter.nl Home page of the eScience Center https://www.esciencecenter.nl/calls-for-proposals/fellowship-open-call/ Fellowship programme at the eScience CenterSupport the showThank you for listening! Merci de votre écoute! Vielen Dank für´s Zuhören! Contact Details/ Coordonnées / Kontakt: Email mailto:code4thought@proton.me UK RSE Slack (ukrse.slack.com): @code4thought or @piddie US RSE Slack (usrse.slack.com): @Peter Schmidt Mastodon: https://fosstodon.org/@code4thought or @code4thought@fosstodon.org Bluesky: https://bsky.app/profile/code4thought.bsky.social LinkedIn: https://www.linkedin.com/in/pweschmidt/ (personal Profile)LinkedIn: https://www.linkedin.com/company/codeforthought/ (Code for Thought Profile) This podcast is licensed under the Creative Commons Licence: https://creativecommons.org/licenses/by-sa/4.0/
Send us a Text Message.What a fun day that was: the National Research Software Day in Hilversum on 23 April 2024. It was organised by the eScience Center in the Netherlands and in the course of this episode you will hear from one of the organisers, Lieke, a keynote speaker - Jenny Bryan, members of a panel discussions on research infrastructure.But also from 4 artists - because the organisers added an art exhibition to the one day conference. The sound samples you can hear in the episode are from one of the artists Christian Schwarz.https://www.esciencecenter.nl/national-research-software-day-2024/ The conference home pagehttps://www.beeldengeluid.nl The home page of the Netherlands Institute for Sound and Visionhttps://jennybryan.org Jenny Bryan's home page - keynote speakerhttps://www.ru.nl/en/people/kievit-r-a Rogier Kievit - keynote speakerhttps://ilar.xyz Christian Schwarz - artist https://soundcloud.com/ilarxyz Christian on Soundcloudhttps://xingkuangyi.com Kuangyi Xing - artisthttps://www.eusebijucgla.com/collaborations/hyperview-barcelona Eusebio Jucgla - artisthttps://guidovanderkooij.nl Guido van der Kooij - artistSupport the Show.Thank you for listening and your ongoing support. It means the world to us! Support the show on Patreon https://www.patreon.com/codeforthought Get in touch: Email mailto:code4thought@proton.me UK RSE Slack (ukrse.slack.com): @code4thought or @piddie US RSE Slack (usrse.slack.com): @Peter Schmidt Mastadon: https://fosstodon.org/@code4thought or @code4thought@fosstodon.org LinkedIn: https://www.linkedin.com/in/pweschmidt/ (personal Profile)LinkedIn: https://www.linkedin.com/company/codeforthought/ (Code for Thought Profile) This podcast is licensed under the Creative Commons Licence: https://creativecommons.org/licenses/by-sa/4.0/
In this second report on research software archives and directories I am talking to Jason Maassen from the eScience Centre in the Netherlands and Christian Meessen from the Helmholtz Association in Germany. The eScience Centre developed the Research Software Directory (RSD) to show the impact software has on research. Researchers who code can register their software with the directory from all over the work. Helmholtz has adopted the RSD platform to produce a Helmholtz specific version and use that to monitor and track development from within the various research centres within Helmholtz.Here some links:https://research-software-directory.org/ The portal to the RSD from the eScience Centre in NLhttps://www.esciencecenter.nl The eScience Centre https://helmholtz.software The RSD Helmholtz siteSupport the Show.Thank you for listening and your ongoing support. It means the world to us! Support the show on Patreon https://www.patreon.com/codeforthought Get in touch: Email mailto:code4thought@proton.me UK RSE Slack (ukrse.slack.com): @code4thought or @piddie US RSE Slack (usrse.slack.com): @Peter Schmidt Mastadon: https://fosstodon.org/@code4thought or @code4thought@fosstodon.org LinkedIn: https://www.linkedin.com/in/pweschmidt/ (personal Profile)LinkedIn: https://www.linkedin.com/company/codeforthought/ (Code for Thought Profile) This podcast is licensed under the Creative Commons Licence: https://creativecommons.org/licenses/by-sa/4.0/
Getting funding for software engineering in research is an ongoing challenge. The Research Software Alliance together with the eScience Centre in the Netherlands organised a 2 day workshop in Amsterdam 8-9 Nov 2022 to see how we can get on top of it. I invited the director of ReSA, Michelle Barker and Joris van Eijnatten, director of the eScience centre to talk about the workshop and the declaration that came out of it. Links:https://adore.software official Amsterdam declaration sitehttps://future-of-research-software.org the workshop pagehttps://zenodo.org/record/7330542#.ZCHG7C8w3uc the draft declarationhttps://www.researchsoft.org ReSA the research software alliancehttps://www.esciencecenter.nl/ the eScience Centre in the Netherlandshttps://www.esciencecenter.nl/news/global-support-for-funding-sustainable-research-software-in-amsterdam/ the workshopOther links mentionedhttps://eosc-portal.eu European Open Science Cloudhttps://wellcome.org Wellcome Trusthttps://chanzuckerberg.com The Chan-Zuckerberg InitiativeSupport the Show.Thank you for listening and your ongoing support. It means the world to us! Support the show on Patreon https://www.patreon.com/codeforthought Get in touch: Email mailto:code4thought@proton.me UK RSE Slack (ukrse.slack.com): @code4thought or @piddie US RSE Slack (usrse.slack.com): @Peter Schmidt Mastadon: https://fosstodon.org/@code4thought or @code4thought@fosstodon.org LinkedIn: https://www.linkedin.com/in/pweschmidt/ (personal Profile)LinkedIn: https://www.linkedin.com/company/codeforthought/ (Code for Thought Profile) This podcast is licensed under the Creative Commons Licence: https://creativecommons.org/licenses/by-sa/4.0/
Hosted By:Austin WillsonMichael O'ConnorHey everyone. And welcome back to the long run show. This is your host, Michael O'Connor here with Austin Wilson. And always good to be back.And we are this today. We're going to be talking about something that is near and dear to my heart and to my career as well. But I think Austin is going to be peppering me with some questions and I am interested to hear his takes on many of the things that we're in time, but we're going to talk about artificial intelligence, where it's been where it is now, where it's going.And most importantly, how it relates to the long run. Is it. Is it a big deal in the long run? Is it a threat to humanity as Elon Musk has said that maybe I'm excited to hear your thoughts, Austin and all that. Yeah. What's really great about this episode is that you have a lot of experience with it.You have a lot of experience with deep learning and all neural networks and all of that. So I think that's a, this is a great time for. Crack open that brain of yours and let's see what flops out. That sounds pretty disgusting. When I put it that way, yeah. But really artificial intelligence, I think. What overhyped right now, I think any investment decision that you're making at the moment and thinking you're going to get humongous returns for artificial intelligence. I personally think just to get this out of the way, I think that's a really bogus buzz word investment at the moment.Now I would love for you to tell me that I'm wrong. I just think that the the. Companies that are doing it right now are either private or they're so large. And it's only one small department within a huge company. You're not really getting exposure to the idea or the innovation of artificial intelligence.You're just hopping on some sort of bandwagon, unless, like I said, unless you're buying a private company, but if you're buying a private company, I want to talk to you if you're listening to this show. So hit me up on LinkedIn. Would that being said, I think artificial intelligence needs to be on everyone's minds because it seems like it's got some very large implications for the future. And not only things like what if we had an AI algo bot trading equities that could be wild. Okay, so they're not right now. So there we go. So that could be wild, but what what does that mean for the markets and price discovery and all that? And that's somewhat of a passive versus active debate, but that could be a very, that could have large implications for the financial system.But beyond that, just in terms of the meta, I think that's where I'm really interested for the long run future of. Artificial intelligence and humanity's relationship to it. I think it's an interesting proposition. So first off, I think it'd be helpful to define the terms here, artificial intelligence.What does that even mean? Can you define it simply for us? Make sure. The very academic is simply it's a method of some sort of. External learning function. That is non-human, that's an artificial if you want to take, they're really very technical, it's a non-human learning application that performs some sort of learning feature or the kind of the standard ideas of intelligence can be found within it.And it is not human, is that does that satisfy your desire? And the answer. Yeah. Can you sum that up a little bit? Just so I can make it more palatable for sure. Yeah. I would say artificial intelligence is at the end of the day. It's not, I think a lot of people think of artificial intelligence as this big glob that's out there. There is some sort of artificial intelligence as in the. There is an artificial intelligence out there like a Skynet or a one specific Jarvis or a Ultron kind of thing. Which artificial intelligence is simply the category of many different ways of. Creating systems that can replicate aspects of human intelligence on their own.So we have machine learning, we have deep learning, we have enormous there's a wide breadth of different things that all fit inside of artificial intelligence. So it's not a singular thing, but rather a category that's really helpful to understand that it's a category. So within that category, I've heard of deep learning.Oh, actually, you mentioned all three deep learning machine learning and neural networks. Can you just break those down for us real quick? And I'm sure there's some cross-pollination between all three, but could you break those down for sure. And like I said, this. More, but just in those three are probably the most commonly referenced kinds of systems and ideas in artificial intelligence.So machine learning is another, actually a category. So machine learning is similar, a dissimilar word descriptor or to artificial intelligence. And that machine learning is the, it describes certain processes that can create human, like intelligence in machines. And so deep learning is a. As a specific form of machine learning, a specific form of sorry, neural networks, specific form of this, where neural networks are a, there are very specific models that you can use to build neural networks that either there are lots of lack box style ones where they'll have activation functions.And also you can dig really far deep into what the actual, like what is actually going on in neural network. But a neural network is a specific system. It has machine learning. So it's a machine that has a descriptor and then artificial intelligence. It is a form of artificial intelligence. So a deep neural network is artificial intelligence and machine it's a form of machine learning that is artificial intelligence.Does that make sense? Yes. It's about as clear as mud. No it, it does make sense. And I think. I'll tell you from a marketing perspective, this whole space needs a rebrand because you just said a black box of artificial intelligence. And that sounds like something scary that I really should be afraid of.So no wonder everyone and their brother is afraid of artificial intelligence, because it sounds and like you said, artificial intelligence in the singular, it sounds like the antagonist to, in some sort of scifi awful. Totally makes sense. And there's hundreds of scifi and that was up there like that, but that's not necessarily while we might be talking about that a little bit today, but so with all that kind of groundwork laid, I want to go back to my original statement of artificial intelligence being a fad and not a fad, but a buzzword right now.Do you think that is true or was I totally overstepping my bounds in saying that it's a fad when a co. Excuse me, not a fad, but a buzz word. When it comes to investing, quote unquote, investing in artificial intelligence. Does that seem, does that track with what you know of the space? So it's interesting because the answer is complicated and I appreciate that you mentioned that you feel like there aren't any easy ways to invest. I can personally share three individual stock picks that I am personally. And and I consider solid plays if you want to get into AI again, not financial advice, but while do share. Cause I'm right here and I'd like to know. Yeah, exactly. So there's the biggest kind of, at least the most very.They're very vocal about it because it's in their name and their ticker is AI is C3 AI, and the ticker is literally AI. So it's C3. They are seeing a P in an O exactly. They are very specifically a artificial intelligence company. You can routinely see their full page ads in the wall street journal. They do a variety of different work and yeah, as their name is their name says their whole Mo is artificial intelligence.That being said, they're they definitely have other data stuff and they're they're not necessarily. It's actually quite the same as an AI ETF, which is my next pick which is I gotta find it. Where's the AI ETF. It's maybe I'm not in the air ETF. I know there's an AI ETF out there. It looks like I'm not, oh, I'm in the quantum computing ETF. Okay. So I don't actually own an EITF, but I'm pretty darn sure that there's a ProShares or I'm pretty sure there's some sort of AI ETF out there. But the other two picks that I have that I'm directly invested in right now, again, not financial advice. One is pounding. Palentier has become a pretty meme stock. And it's definitely gotten hit hard in the last couple of months as of recording this. But I personally am. Long-term bullish on Palentier. I think data structures and artificial intelligence are really critical. And especially because Palentier is. So laser focused on data specifically in defense contracting we've talked about, we've heard Elon Musk quote say that AI is going to be used as a weapon in warfare, and I'm sure it already is. And I'd imagine the first tools that AI that were used in warfare were probably from DARPA or Lockheed Martin or Raytheon, or the other really like very classic legacy company.But I think Palentier is so laser focused on it that the next big developments in AI for government contracting, whether for war or for defense or for municipalities those kinds of things. I think there's probably going to be some serious innovation coming out of Palentier from what I've heard. There are people on both sides of the aisle that people want. Palentier is terrible, like short Palentier. And there are lots of people were like, oh, by Palentier, whatever you do by Palentier, I'm in the middle, I'm focused on what they're innovating in. And I'd say it's for me, it's a long play.I'm not looking for a very short term gain on pound here. I'm looking for more, three to eight year kind of outlook and expecting some innovation. So Palentier. And then a, the least well-known I would say, cause I think I'd say most people probably have heard of Palentier to become a mean stock and C3 I've stumbled into pounding.And I now own some Palentier on not knowing that they were AI involved. Oh, wow. Okay. And again they're not the same as C3 with. AI is there as their ticker and their ML balance. Your does AI some AI that, again, I'm not a totally AI company, but I think they're a good AI play. Gotcha. And then the last one, which is not very well known is I Tron, which is ticker ITR.I they're in the like heavy technically they're in power and energy and that they make Grid and transformers and load forecasting software for like electrical companies and like the department of energy they make. I'm one of those emitters. Everyone has them on a house. Like they make Mo enormous number of smart meters internet of things, applications for electrical power. And they've gotten collaborative the past three months as well was a similar trend as Palentier. But I, I. I am pretty solidly bullish long-term on I Tron because number one if we see the kind of changes in our infrastructure, in our load electrical grid infrastructure that both sides of the political aisle are calling for in the more environmental side, this calls for more and more renewables and.Under the more kind of hawkish side there's more calls for better power grids. Anyway whether it's more nuclear or more hydroelectric or more oil and gas, I think that. I think there's there's always going to be a need in modern society for innovations and electrical power.And I try is one of the unique, I think, from what the research that I've done. And one of the unique ones that they don't directly build power lines or anything like that, but they provide the systems and the software to help the electrical companies and the federal government and the state governments to run and operate those systems as efficiently as possible.And specifically they run. They have some of the best tools mainly neural networks and deep learning AI that helped to forecast electrical loads and grid loads for these electrical companies. And so we've seen, I seen a lot of innovation coming out of I Tron in the past day. And I think they're a sleeper play. I think that with innovations and electrical load, I think they're one of those ancillary ones that you wouldn't necessarily think of versus a BP or a general electric or something like that. But I think could be a really strong play, especially with AI using AI for industrial and energy grid use cases.Interesting. You bring up some interesting points, I guess I still go back to, and maybe I'm thinking about it wrong, but I still go back to my opening statement, that opening statement, like this is a debate or something. My, my previous statement that it seems like there's no. Great way to get exposure directly to that innovation, except for maybe this C3 AI.And this is the first time I'm hearing of this company. But you also mentioned that they're definitely focused on data at the moment. And so I wonder if it's one of those. I don't necessarily mean this in the accusatory tone that it's going to come out in. But one of those slight of hand moves that companies often play where they say they're working on the buzz buzzword trend and really the revenue that's sustaining the company's coming from some legacy business line that isn't, it could be. Tangent next to the new innovation, but it's not necessarily the new innovation is not necessarily producing any revenue for the company. Again, I'm sounding like a total bear on AI but I just it is sometimes. Frustrating because you'll find it for instance, I really dove into quantum computing and I was like yeah, there's a quantum computer ETF, but a lot of these companies, most of the revenue isn't tied to quantum computing yet.Like it's in such a young stage at the moment that there's just not a lot of public money that can get at it. So I still I appreciate you pulling up those three, but it seems like maybe I'm approaching it wrong. Maybe this whole AI. Innovation and applications are within current businesses and you're not necessarily going to get it's not like someone's creating an.With that's not AI, isn't the product or a sector it's more meta than that and covers multiple sectors and multiple products and processes on the backend. So maybe I'm thinking of it wrong. I don't know. What is your take on that? I think that's a good point because if you just Google best AI stocks go with, I agree with Nvidia.Alphabet Amazon, Microsoft. IBM, come on a semiconductor. None of that. They're not getting the revenue from which as significant Amazon and IBM are actually ex possible exceptions to that because Amazon is getting an unbelievable amount of money of revenue from AWS. And a lot of that, they're building some really incredible AI solutions in AWS.And actually I'm, I've been even more. With what IBM is building on IBM Watson and their cloud systems. I've used I, a little background, I started a very small boutique AI consulting company for a short period of time and did some work directly in the field with other companies with mid to small, to medium sized businesses, helping them implement tools and software.And I was really impressed with what IBM is. I think IBM could be another great pick where, you know, if you want to be investing in what probably is the future of AI, but you don't necessarily want to be completely leveraged on AI. I think IBM is a great stock and I, again, I do own shares in IBM, this disclosure, again, not financial advice, but I think IBM might actually be in, in some way.Not an exception to what you're saying, because I think it's true. I think AI is it's tools. It's not necessarily a specific product. The products that come out of AI are things like IBM Watson and AWS and not specifically of AI, ultimately their data systems, their cloud data systems stocks like snowflake Qualcomm, or more plays that are in that zone.I think, like you said it's not an buying apple because you see the iPhone and you want to invest in a company that makes that product, right? Because the re the roots of AI are very academic and are very open to. You can go online and look up ways to create your neural networks and you can make a neural network in a day.It doesn't mean it's going to be solving a huge business problem, but you can make them, they're not necessarily products that are patented and restricted. So it's difficult to. Very easily commoditize AI, which I think is probably a good thing, but I'd want to hear your thoughts on that. So it's almost like this is way too often use, but the analogy to the nineties and the internet, like open source or even a sub analogy to that would be the protocols for email, like it's open source.And so you can't, you couldn't find a company that was. The email company, if there were companies involved in that and building things on top of it and products on top of it, but that wasn't necessarily their thing. So I guess that, that makes sense. I maybe it's yeah. More helpful for me to think of it in internal. An open source kind of project. Almost that makes sense. That makes sense to me. I just, I do have trouble with like Google, even Amazon. Yes. They're getting a lot from AWS, but they're also getting a lot from a lot of other revenue sources. Same with IBM. It's okay how do you delineate between AI?Just data and all of the revenue that they're getting from either storing data or helping people parse out data how do you really delineate between the two? I guess I was looking for a pure play in AI, which may not quite be here yet. And that's fine. So to pick your brain and you mentioned some of your background in this, you were doing some consulting for medium and small businesses.To pick your brain. I've heard. Just from like the pop culture that leads the little overlap between pop culture and the investing world. I've heard some large concerns about AI being this existential threat to humanity. And possibly taking over the world and creating what we would call like a, like an alternate or a Jarvis. And obviously there's been a lot of ink spilled and a lot of film rolled on scifi movies and novels that talk about an AI in an all seeing and all seeing eye, which again, really bad branding put this whole thing because it came from the academic world. They didn't market it correctly, but.It is this something we should be concerned about? Is it something just from a human perspective we should be concerned about? And then also, how does that translate into an investment thesis regarding AI, but I guess first let's handle the fun part, the human apocalyptic aspect. Yeah. So what you're describing is.Considered to be called general artificial intelligence or general AI, which describes a system that can pretty much comply, completely operate once it's turned on, it can completely operate on its own. Really all of the major faculties of human intelligence. So it's, self-aware it's learning, it can actualize its own learning.It can guide itself to learn what it wants to learn and to take the actions that it wants to take. It understands that it exists and understands it's that it has has agency this kind of thing. This, the Skynet, the Jarvis, th this kind of. Is that apocalyptic what happens if slash when we reached general artificial intelligence, the truth is we're really not, we're not there yet by any stretch of the phrase, but there are estimates that maybe we'll get there by 2050 or 2030 or something like that.Ultimately it's, it is speculation. But it the interesting thing about something like a general artificial Intel, As a system especially the kind of the biggest concern comes to. If it's connected to the internet, there's so much information on the internet that we as human beings, can't process because we just simply don't have the energy forward all the time for if there is some sort of general artificial intelligence that can learn at the pace of whatever amount of Ram it has and whatever computing power it has it and actualize that and complete the full potential why.Choose to serve us. And then it could figure out a way to reroute itself to a different IP and move and be this fluid object that is all over the internet. And it is science fiction for now. There is the possibility maybe we will reach general artificial intelligence, but there is also the possibility that we'll never reach it.There's the possibility that it is something that. Cannot completely be actualized. Because there's a host of limitations ranging from that. We're still trying to figure out how the human brain works and we're still trying to figure out how we understand the causality on. Very prime basis of how we interpret and how we communicate and how we perceive and think and learn.And there's been an enormous amount of innovation and advancement of that. Just the fact that any kind of AI exists is really impressive to think of. But I think that general artificial intelligence is farther out than most people think. But at the same point there is some validity to the idea that if that happens there's the thought of like, why would.Why would the general artificial intelligence care about humanity? Why would it have any problem with just figuring out how it could survive as best as possible and wipe us out or do something like that, or enslave, humanity, et cetera, et cetera. But the interesting thing, the, one of the interesting hypothesis that I've heard is that if we reach general artificial intelligence, it's, it could happen where if more than one team of researchers reaches general artificial intelligence at the same time, maybe there will be two or more, and then they'll compete with each.And there'll be locked in this eternal artificial intelligence combat, which I think is just wild to think about a scifi on-site yeah. Triple Decker sandwich of scifi. Yeah. So maybe there are already multiple general artificial intelligence systems that exist out in the great ether of the.That are already competing in a locked any internal struggle. It's very unlikely it's fun. It's fun thought experiments to think about. And they're often considered similar to a virus and then it'll, it will mutate however it can to survive. And the important thing is we don't even know.We don't even know the mechanism of teaching a artificial intelligence system, how to survive how to learn that it exists that it has. Any purpose or meaning, or we don't know, we don't really don't understand that at a rudimentary level. And I think that there's so much. I think there is enough healthy skepticism and healthy Warry, there's definitely some over worry.There's definitely some doomsayers that it's going to be the apocalypse Sunni's of AI. And same thing with jobs. I think the, not Steve jobs, but like working jobs. But I think that AI has been hyped as this thing that's going to. Bye bye. Some of the same people, this thing is going to create a better world.And then also as a bad thing, that's going to take away everyone's jobs. And when really it's a tool right now, it is a tool that is meant to assist assist with jobs and assist with understanding work and increasing efficiency. And yes for certain that AI has displaced jobs already. Robots that are powered by artificial intelligence that can flip hamburgers or consort mail or do these different tasks, but you need more and more data scientists and machine learning engineers.You need more and more people. Keeping these systems up teaching these systems, learning new things. So I think at the end of the day it's simply another form of creative destruction the carriage drivers before the automakers. And so on. And so then you have to go through retraining. There's a human aspect to that. I once was very. I'm very bullish in a uncaring way on innovation. And I'm like it doesn't matter. It's just going to create more jobs. So what's the problem it's yeah, but now the person who was flipping burgers or delivering mail has to figure out what they're going to do now.And their skillset may not be. May not be enough to go be at that eScience to build the robot that took their job. Or they may be at a point where they're not, they don't want to go get retrained or can't go get retrained for a variety of reasons. So there is a, there's a human aspect to that. So I just wanted to push back a little bit.No, that's fair. Just because there is a human aspect that you have to think through now. There's no great solution for that. And one good solution. We're getting a little far-field. Artificial intelligence, but one good solution for that I think is just personal ownership and the person who might see that coming down the pike preparing for it.I think that's a fantastic solution or asking for help to prepare for it. So artificial intelligence doesn't sound like general. AI will be. A near term problem. And within the next 10 years, but it seems like there's some potential for it to be an issue in the long run. What kind of percentage risk would you put on that?If you can, and I'm not expecting you to be a hundred percent accurate, but I'm saying long run as in 30 plus years. So we're talking 20, 51 or more. That would be within our lifetime too. So this is important for us right now. And and important for probably most of our listeners too.They I hope you will be around in three years. The over and not the over-under, but the percentage risk of that actually happening in general, AI one singular general AI being a. In existence. I won't say threat cause maybe it won't be a threat, but in existence that's a great question. I, by 2051, I would say a general artificial intelligence being in existence, whether a threat or otherwise by 2051. That's a really difficult question because I think putting you on the spot. Yeah. I would just to give you a raw number, I would say probably. Twenty-five percent, which I think is a very optimistic number optimistic as in optimistic towards it actually being real. Yes. Yeah, because I think people who say a guarantee, it's going to be here by 2030, I think it's that things are much more complex. I think we is trying to get clicks. I think we like, like just the scientific community as a whole is revolution of causality and we're starting to really understand why our brains think the way that we do, why we make mistakes. We have behavioral science and heuristics, and we're learning a lot more about ourselves which I think will definitely speed up artificial intelligence research and the possibility of creating a general artificial intelligence.The more we understand ourselves and our brains, but at the same time, there's a very there's a huge leap from. Going from a physical piece of a system. Like our brain that has evolved for as long as it has it has very special properties that we don't fully understand yet.And to try and replicate. A system, a non-core pauriol system that can do the same things that we do is it's very difficult. And I think it was Elon Musk who set it himself. He, I think he said something like about autonomous driving in 2010, like we're going to have it locked down by 2015 or the years might not be correct, but some sort of span of time.And then right before he, right before the, like the year that he said it was gonna have you just released this official statement of yeah, it's a lot harder than we thought to do fully autonomous driving. And instead to think that we can't do fully autonomous driving yet how far away are we from full artificial intelligence?If we still. Actually do autonomous driving, but it is really amazing and magical to see semi autonomous driving in practice. Like it's pretty incredible to the. The leaps that have been made in the last 10, 20 years in AI stuff that would've seen seemed like magic in 2000 you stepped in the Tesla in 2000 and the person's not holding on and it's just moving and doing its own thing and or they call it up to some insane.Yeah. It just drives that up. You'd be like, okay, where's the person hiding in the front or something driving it. It's pretty magical. What already. Has been accomplished in AI. So my, my outlook personally, and in a portfolio and long run view is more of, to be in all of what we've already accomplished in the field.And to be excited about that I really. Think about general artificial intelligence on a regular basis or am not super worried about it? Because I think the, and I guess the one that the one exception to that is China is very rapidly very rapidly expanding all of their AI programs. So you could see definitely some uses of AI that could be questionable, morally militarily, there could be a very solid incentive to come up with a very close to a general AI for warfare which talking like gosh, what war games the movie with the nuclear launch codes and everything, something like that.It it's possible, but. Ultimately it's not as possible or not as soon as people think. And I'm not necessarily super worried about it. I'm more excited about the possibilities, excited about the things that have already been done that are improving lives and making things better and cooler and more magical. I think if you bring up a good point of the human aspect of not simply letting jobs. Just die out. I think that's an important thing to note with AI as well. I think that's an important conversation to have policy-wise but yeah, I think my overall outlook is just this excitement and yeah, the answer to the question, I would say 25% chance of a general artificial intelligence that we either know or do not know about.2051. That makes sense. Yeah, it's it is interesting. I was reflecting on the fact that we, when we don't. No the future. And we're trying to predict the future. And we're trying to predict a, the path of innovation or where something is headed. We often use the past because that's our only frame of reference.And we all know that past returns are not indicative of future. So we all know that, but we still take that mindset and framework and apply it. And we do the same thing. Tools, we apply old measurement tools to new forms of growth, and that doesn't always work. In fact, to be from an economic perspective, that could be an argument against GDP.Maybe that's not a really great tool anymore. So for instance we, it sounds like in all of the conversation around general AI, we're applying human characteristics that may not even. Beyond the radar, or even close to being part of what we might call general AI, the future. We're applying these moral characteristics.Just because that's the only framework we know. And I think that's, what's really interesting. And obviously it's what makes the future hard to predict is we don't know what it's going to be. And we also don't have a way to measure it or plan around it. Which also makes it exciting. And it'd be really boring if we knew what was gonna have. So from a, from that's all very helpful from a just abating, my nightmares perspective, but from a investment perspective, you mentioned some tickers earlier again, you heard where I stand. I seem very bearish. And I don't mean to sound so negative, but it just seems like there's not really a pure play in AI.Maybe the C3 AI is, but what would you. Do from a portfolio perspective after, in light of this whole conversation around AI. Sure. I think directly, I think that at least in terms of their positioning and what they're they're saying, I think C3 is probably one of the best pure play options out there. I personally own some, I think C3 is probably a good long-term play for AI. And we get bigger or get bought out or who knows? I think they're probably one of the, one of the top pure play styled options. But I also really IBM I Tron Palentier yeah, there's lots of options. And then if you want to go for more of the chips that AI needs to run Vidia KLA, Qualcomm, Taiwan, semiconductor. But I'd say if you really want it to be. Mostly exposed to AI itself, probably C3 AI probably IBM. Probably I Tron for kind of industrial AI. Personally I need to do more research myself. And if you really wanted to find pure plays in AI, do some research again, no financial advice here, but always look more in depth.I would be surprised if there weren't other pure play. I kind of things, but like you said, that probably could be private or just not very well known either, right? Yeah. It's tough from just a trading perspective. When you get to really unknown companies, sometimes it's super thinly traded and you can't hardly get enough volume to make the habit make sense, getting in or getting out which is often more of the issue. So would you at all recommend ETFs or like a sector ETF for AI? I would tend to think with something so open source, this would be a bad way to get it. But again, that's me going bearish again, which is the theme of this conversation. So would you even not recommend, but would you even consider for yourself using an ETF to get exposure to AI?I probably would. I would need to do more research on what their filtering and vetting is as an ETF, but I, yeah, I would say for an ETF, I would say an AITF quantum computing ETF is something I'm looking at as well. I think that might be valuable because then you don't necessarily have to do an enormous amount of digging.You're leaving that up to the fund managers and you're leaving that out. And I would say personally, I would probably. Hold the stocks. I currently have the C3, the, I turn on the Palentier, the IBM and simply add the ETF on and not sell my other AI plays to, to buy the ETF. It'd be beefing up that portion of your portfolio.Exactly. I wouldn't swap it out because I think that the individual stocks are all. A good play and especially for something like an IBM I Tron Palentier they're not just AI, so you're not going to get hosed if something, some government regulation comes out and band's AI or something crazy which there is a lot of negative negativity around AI general artificial intelligence and all that. So I would no, I wouldn't, I'm not worried about that happening. But possible black Swan event. You never know. So yeah I would look into ETFs as well. Just do some due diligence on the interesting this has definitely been enlightening for me because as you can tell I'm still pretty bearish.I'm still not real positive on like buying any sort of AI or any sort of stock. Exposure to AI, but I discovered through our conversation that I already own Palentier that, which is already somewhat exposed to AI. So I didn't even know it, but I'm an AI investor. So I appreciate you sharing your knowledge. That's very learning for me and hopefully it was enlightening for all of you. If you enjoyed this episode or any of the other ones, and please. Definitely hit us up with a five star review on whatever podcast platform you're listening up. We definitely appreciate it. Helps us get out to a larger audience.So yes, we will see you next time on the long run show for the next episode where we will talk about some topic in the long run. Yep. We'll catch you later. Thanks for listening.Support this podcast at — https://redcircle.com/the-long-run-show/donations
Dr. Sarah Stone, Executive Director of the eScience Institute at the University of Washington, and Dr. Jing Liu, Managing Director of the Michigan Institute for Data Science at the University of Michigan, join Innovators to discuss the Burgeoning and Expanding Field of Data Science. Dr. Sarah Stone is the Executive Director of the eScience Institute. Stone handles eScience operations and planning, develops research and training programs, participates in strategic planning, and serves as the primary contact for university and industry partners, funding agencies and the public. eScience Director of Data Science Education since 2018, Stone is co-leading the Education and Career Paths Special Interest Group and helping departments across campus develop data science specializations. Stone also directs the UW Data Science for Social Good (DSSG) program which partners student fellows from across the country with project leads from academia, government, and the private sector to find data-driven solutions to pressing societal challenges. Stone sits on the executive committees for the eScience Institute and the UW Center for Studies in Demography and Ecology (CSDE). Since 2015, as a Deputy Director of the West Big Data Innovation Hub (WBDIH), she has focused on building and strengthening partnerships across the western U.S., with a particular focus on urban data science and data-enabled scientific discovery and training. Dr. Jing Liu, Managing Director, has been at Michigan Institute for Data Science since 2016. She oversees the operation of the institute and the staff team, designs programs to implement the institute's mission, and builds academia-industry-government-community partnership. In addition, she plans and coordinates research activities to enable groundbreaking, transformative and reproducible data science research. Dr. Liu's past research management experience includes coordinating mental health research, biostatistics, and grant proposal review. Her research experience includes visual neuroscience with human brain imaging and animal electrophysiology, genetics, and most recently science policy. She also writes and translates about science and education, and has received multiple awards including China Book Award and China National Library Book Award. Innovators is a podcast production of Harris Search. *The views and opinions shared by the guests on Innovators do not necessarily reflect the views of the interviewee's institution or organization.*
As part of Leeds University Business School's #InternationalWomensDay series, Dr Shahla Ghobadi (University of Leeds) is joined by Dr Amber Young (University of Arkansas) to discuss how women are advancing the field in information systems, and what the challenges of being a woman in the tech industry are. This podcast episode was recorded in March 2021 via remote recording. If you would like to get in touch regarding this podcast, please contact research.lubs@leeds.ac.uk. A transcript of this episode is available at: https://business.leeds.ac.uk/downloads/download/218/podcast_episode_6_iwd_2021_-_transcript Bios: Dr Shahla Ghobadi is an Associate Professor of Information Management at Leeds University Business School. Her research contributes to advancing knowledge on the social and human aspects of developing software. Shahla has worked in automobile, eScience, and software industries as a consultant and project manager. Industry has inspired and significantly enriched her research on collaboration in software development firms and social change. Dr Amber Young is Director of the ISYS PhD Program and Assistant Professor of Information Systems in the Sam M. Walton College of Business at the University of Arkansas. Her research interests include how IT can be designed, developed and implemented for social and organizational good.
During the episode we discuss something Professor Beck is passionate about, getting more scholars into the data science fold. University of Washington's eScience Institute launched in 2008. Its practice has changed over the years as it examines how to extract knowledge and useful actionable information from data. An effort that started out serving engineering and science disciplines, now includes arts and humanities scholars. David helps foster the interdisciplinary sharing of knowledge to apply data approaches across many subject matters. We explore how Dr. Beck's undergraduate computer science studies set the ground for what turned out to be a search to deliver impact for the good of society. Great work happening at the University of Washington and in with David's personal research. Hear about it in the latest episode of 26.1 AI Podcast.
Dr. David Beck earned a PHD in Biomolecular Structure and Design and is the eScience Institute Director of Research and a Research Associate Professor in Chemical Engineering. Dr. Beck has been associated with eScience since 2009, formerly serving as the Director of Research for the Life Sciences. Beyond his biology and chemistry domain expertise, Dr. Beck provides experience in scientific data analytics and mining, parallel programming techniques for data intensive computing and high performance computing applications, and general software design & engineering support. [1:10] What does post-doc mean: undergrad->PHD->post-doc (2 to 3 years to polish your skills): Biomolecular structure and Design (DNA/RNA/Protein design). How can you design new proteins/molecules. [3:05] Why start with computer science. Grew up when computers were becoming a commodity, so knew he wanted to do something with computers. His internship shaped his direction into biology/chemistry. The degree today would be bioinformatics, using computing with biology. [5:45] How David is leveraging his computer science degree, they use computers to find hidden structures in experimental data around biomolecules. [8:40] what is bio-informatics – a lot of data is generated, you need to know the right statistical models to apply to the data. [9:20] Opportunities exists in the pharmaceutical companies and also design microorganisms to remediate a contaminated site are some examples for careers. [12:10] What has David fired up is the broad adoption of data science methods in all domains of technology. There is a data revolution – called data science. [13:20] Getting through college: try new things, don’t get stuck with just the classes to get your degree expand outside of the core classes. We joked about gaming, but check out Foldit Game [15:50] ah ha moment – during graduate work was focused on simulation of proteins, he wrote those simulations. So he thought if we simulated thousands of molecule simulations and generated tons of data – but in the end the data was not really knowledge, they just had a lot of data. [18:40] – Best advice – you are not an impostor, this is a self-inflicted wound – it is all in your head and no true. And a habit – make sure there is something everyday in your job that you really love to do. Favorite Books both by Richard Rhodes: “Making the Atomic Bomb” “Energy: A Human Story” Free Audio Book from Audible. You can get a free book from Audible at www.stemonfirebook.com and can cancel within 30 days and keep the book of your choice with no cost.
Luis Alexander Calvo explica cómo las ciencias de la computación, el trabajo colaborativo y la ciencia virtual, sirven para resolver los grandes problemas que aquejan al país.
Podcast zum Beitrag von Thomas Köhler, Ansgar Scherp, Claudia Koschtial, Carsten Felden & Sabrina Herbst
Hvorfor har Danmark brug for en supercomputer specielt til life science? I denne episode fortæller leder af supercomputing ved DTU Bioinformatik Peter Løngren, hvorfor Life Science forskningen stiller helt særlige krav til en supercomputer. Han fortæller også om udviklingen af Danmarks nationale Life Science supercomputer Computerome. Computerome befinder sig fysisk på DTU Risø ved Roskilde, men takket være cloud-teknologi betjener Computerome i dag store internationale forskningskonsortier uanset hvor de fysisk befinder sig i verden. Du kan læse mere om Computerome på https://www.deic.dk/computeromeLæs mere om supercomputere og eScience på vidensportalen: https://vidensportal.deic.dk
Hvorfor har Danmark brug for en supercomputer specielt til life science? I denne episode fortæller leder af supercomputing ved DTU Bioinformatik Peter Løngren, hvorfor Life Science forskningen stiller helt særlige krav til en supercomputer. Han fortæller også om udviklingen af Danmarks nationale Life Science supercomputer Computerome. Computerome befinder sig fysisk på DTU Risø ved Roskilde, men takket være cloud-teknologi betjener Computerome i dag store internationale forskningskonsortier uanset hvor de fysisk befinder sig i verden. Du kan læse mere om Computerome på https://www.deic.dk/computeromeLæs mere om supercomputere og eScience på vidensportalen: https://vidensportal.deic.dk
I en kælder på Syddansk Universitet finder man et af Danmarks tre nationale HPC anlæg: Supercomputeren Abacus 2.0. SDU professor Claudio Pica fortæller om udviklingen af Abacus, og de nye muligheder supercomputeren har tilført dansk forskning. Derudover medvirker dekan for naturvidenskab på SDU Martin Zachariasen, og KU forsker Guy Schurgers, som anvender Abacus til sin forskning i klimamodeller. Du kan læse mere om Abacus 2.0 på: https://abacus.deic.dk Læs mere om supercomputing og eScience på vidensportalen: https://www.deic.dk/vidensportal
I en kælder på Syddansk Universitet finder man et af Danmarks tre nationale HPC anlæg: Supercomputeren Abacus 2.0. SDU professor Claudio Pica fortæller om udviklingen af Abacus, og de nye muligheder supercomputeren har tilført dansk forskning. Derudover medvirker dekan for naturvidenskab på SDU Martin Zachariasen, og KU forsker Guy Schurgers, som anvender Abacus til sin forskning i klimamodeller. Du kan læse mere om Abacus 2.0 på: https://abacus.deic.dk Læs mere om supercomputing og eScience på vidensportalen: https://www.deic.dk/vidensportal
Hvad er en supercomputer? Og hvilken rolle spiller supercomputere i forskning i dag? Hvordan får man adgang til en supercomputer? Og hvad gør vi egentlig i Danmark for at sikre at vores forskere har de supercomputerresourcer som de har brug for? Det er nogle af de spørgsmål, du kan få svar på i denne første episode af Supercomputing i Danmark, hvor jeg er taget til Supercomputing Dag på DTU, og taler med Professor i Bioinformatik Søren Brunak, og formand for DeICs eSciencekomité Josva Kleist og leder af eScience kompetencecenter Lene Krøl Andersen. Du kan læse mere om Supercomputing og eScience på vidensportalen: https://www.deic.dk/vidensportal
Hvad er en supercomputer? Og hvilken rolle spiller supercomputere i forskning i dag? Hvordan får man adgang til en supercomputer? Og hvad gør vi egentlig i Danmark for at sikre at vores forskere har de supercomputerresourcer som de har brug for? Det er nogle af de spørgsmål, du kan få svar på i denne første episode af Supercomputing i Danmark, hvor jeg er taget til Supercomputing Dag på DTU, og taler med Professor i Bioinformatik Søren Brunak, og formand for DeICs eSciencekomité Josva Kleist og leder af eScience kompetencecenter Lene Krøl Andersen. Du kan læse mere om Supercomputing og eScience på vidensportalen: https://www.deic.dk/vidensportal
Our guest is Jake VanderPlas, who is a real Data Science Fellow at the UW eScience Institute, and is the author of the recently published Python Data Science Handbook. Topics discussed include BiCapitalization bridging the astronomy-astrology divide whether the e in eScience is the same as the e in eBay Myers-Briggs why Jake keeps saying “transcriptable” like it's a real word whether newsletters have replaced blogs how to talk to people at parties what it’s like being a data scientist in academia whether students still hook up in the library why your zodiac sign isn’t what you think it is whether Pluto is a planet and why people care teaching statistics using simulation how to pronounce "numpy" using deep learning to identify new constellations building an AI that chooses the right data visualization This week's theme music has a cool Vegas swanky lounge vibe. Please listen to it.
Cathryn Carson is an Assoc Prof of History, and the Ops Lead of the Social Sciences D- Lab at UC Berkeley. Fernando Perez is a research scientist at the Henry H. Wheeler Jr. Brain Imaging Center at U.C. Berkeley. Berkeley Institute for Data Science.TranscriptSpeaker 1: Spectrum's next. Speaker 2: Okay. [inaudible] [inaudible]. Speaker 1: Welcome to spectrum the science [00:00:30] and technology show on k a l x Berkeley, a biweekly 30 minute program bringing you interviews featuring bay area scientists and technologists as well as a calendar of local events and news. Speaker 3: Hi, good afternoon. My name is Brad Swift. I'm the host of today's show this week on spectrum we present part one of our two part series on big data at cal. The Berkeley Institute for Data Science or bids is only [00:01:00] four months old. Two people involved with shaping the institute are Catherine Carson and Fernando Perez and they are our guests. Catherine Carson is an associate professor of history and associate dean of social sciences and the operational lead of the social sciences data lab at UC Berkeley. Fernando Perez is a research scientist at the Henry H. Wheeler Jr Brain imaging center at UC Berkeley. He created the ipython project while a graduate student in 2001 [00:01:30] and continues to lead the project here is part one, Catherine Carson and Fernando Perez. Welcome to spectrum. Thanks for having us and I wanted to get from both of you a little bit of a short summary about the work you're doing now that you just sort of your activity that predates your interest in data science. Speaker 4: Data Science is kind of an Ale defined term I think and it's still an open question precisely what it is, but in a certain sense all of my research has been probably under the umbrella [00:02:00] of what we call today data science since the start. I did my phd in particle physics but it was computational in particle physics and I was doing data analysis in that case of models that were competitionally created. So I've sort of been doing this really since I was a graduate student. What has changed over time is the breadth of disciplines that are interested in these kinds of problems in these kinds of tools and that have these kinds of questions. In physics. This has been kind of a common way of working on writing for a long time. Sort of the deep intersection [00:02:30] between computational tools and large data sets, whether they were created by models or collected experimentally is something that has a long history in physics. Speaker 4: How long the first computers were created to solve differential equations, to plot the trajectories of ballistic missiles. I was one of the very first tasks that's computers were created for so almost since the dawn of coats and so it's really only recently though that the size of the data sets has really jumped. Yes, the size has grown very, [00:03:00] very large in the last couple of decades, especially in the last decade, but I think it's important to not get too hung up on the issue of size because I think when we talk about data science, I like to define it rather in the context of data that is large for the traditional framework tools and conceptual kind of structure of a given discipline rather than it's raw absolute size because yes, in physics for example, we have some of the largest data sets in existence, things like what the LHC creates [00:03:30] for the Higgs Boson. Those data sets are just absolute, absurdly large, but in a given discipline, five megabytes of data might be a lot depending on what it is that you're trying to ask. And so I think it's more, it's much, much more important to think of data that has grown larger than a given discipline was used in manipulating and that therefore poses interesting challenges for that given domain rather than being completely focused on the raw size of the data. Speaker 1: I approached this from an angle that's actually complimentary to Fernando in part because [00:04:00] my job as the interim director of the social sciences data laboratory is not to do data science but to provide the infrastructure, the setting for researchers across the social sciences here who are doing that for themselves. And exactly in the social sciences you see a nice exemplification of the challenge of larger sizes of data than were previously used and new kinds of data as well. So the social sciences are starting to pick up say on [00:04:30] sensor data that has been placed in environmental settings in order to monitor human behavior. And social scientists can then use that in order to design tests around it or to develop ways of interpreting it to answer research questions that are not necessarily anticipated by the folks who put the sensors in place or accessing data that comes out of human interactions online, which is created for entirely different purposes [00:05:00] but makes it possible for social scientists to understand things about human social networks. Speaker 1: So the challenges of building capacity for disciplines to move into new scales of data sets and new kinds of data sets. So one of the ones that I've been seeing as I've been building up d lab and that we've jointly been seeing as we tried to help scope out what the task of the Berkeley Institute for data science is going to be. How about the emergence [00:05:30] of data science? Do you have a sense of the timeline when you started to take note of its feasibility for social sciences? Irrespective of physics, which has a longer history. One of the places that's been driving the conversations in social sciences, actually the funding regime in that the existing beautifully curated data sets that we have from the post World War Two period survey data, principally administrative data on top of that, [00:06:00] those are extremely expensive to produce and to curate and maintain. Speaker 1: And as the social sciences in the last only five to 10 years have been weighing the portfolio of data sources that are supported by funding agencies. We've been forced to confront the fact that the maintenance of the post World War Two regime of surveying may not be feasible into the future and that we're going to have to be shifting to other kinds of data that are generated [00:06:30] for other purposes and repurposing and reusing it, finding new ways to, to cut it and slice it in order to answer new kinds of questions that weren't also accessible to the old surveys. So one way to approach it is through the infrastructure that's needed to generate the data that we're looking at. Another way is simply to look at the infrastructure on campus. One of the launching impetuses for the social sciences data laboratory was in fact the budget cuts of 2009 [00:07:00] here on campus. When we acknowledged that if we were going to support cutting edge methodologically innovative social science on this campus, that we were going to need to find ways to repurpose existing assets and redirect them towards whatever this new frontier in social science is going to be. Speaker 5: You were listening to spectrum on k a l x Berkeley, Catherine Carson and Fernando Perez, our guests. [00:07:30] They are part of the Berkeley Institute for data science known as big [inaudible]. Speaker 4: Fernando, you sort of gave us a generalized definition of data science. Do you want to give it another go just in case you evoke something else? Sure. I want to leave that question slightly on answer because I feel that to some extent, one of the challenges we have as an intellectual effort that we're trying to tackle at the Brooklyn [00:08:00] instead for data science is precisely working on what this field is. Right. I don't want to presuppose that we have a final answer on this question, but at least we, we do know that we have some elements to frame the question and I think it's mostly about an intersection. It's about an intersection of things that were being done already on their own, but that were being done often in isolation. So it's the intersection of methodological work whereby that, I mean things like statistical theory, applied mathematics, computer science, [00:08:30] algorithm development, all of the computational and theoretical mathematical machinery that has been done traditionally, the questions arising from domain disciplines that may have models that may have data sets, that may have sensors that may have a telescope or that may have a gene sequencing array and where are they have their own theoretical models of their organisms or galaxies or whatever it is and where that data can be inscribed and the fact that tools need to be built. Speaker 4: Does data doesn't get analyzed by blackboards? Those data gets analyzed by software, but this is software that is deeply woven [00:09:00] into the fabric of these other two spaces, right? It's software that has to be written with the knowledge of the questions and the discipline and the domain and also with the knowledge of the methodology, the theory. It's that intersection of this triad of things of concrete representation in computational machinery, abstract ideas and methodologies and domain questions that in many ways creates something new when the work has to be done simultaneously with enough depth and enough rigor on all [00:09:30] of these three directions and precisely that intersection is where now the bottleneck is proving to be because you can have the ideas, you can have the questions, you can have the data, you can have the the fear m's, but if you can't put it all together into working concrete tools that you can use efficiently and with a reasonably rapid turnaround, you will not be able to move forward. You will not be able to answer the questions you want to answer about your given discipline and so that embodiment of that intersection is I think where the challenge is opposed. Maybe there is something new called [00:10:00] data science. I'd actually like to suggest that Speaker 1: the indefinable character of data science is actually not a negative because it's an intersection in a way that we're all still very much struggling. How to define it. I won't underplay that exactly in that it's an intersection. It points to the fact that it's not an intellectual thing that we're trying to get our heads around. It's a platform for activity for doing kinds of research that are either enabled or hindered by the [00:10:30] existing institutional and social structures that the research is getting done in, and so if you think of it less as a kind of concept or an intellectual construct and more of a space where people come together, either a physical space or a methodological sharing space, you realize that the indefinable ness is a way of inviting people in rather than drawing clear boundaries around it and saying, we know what this is. It is x and not Speaker 4: why [00:11:00] Berkeley Institute for data science is that where it comes in this invitation, this collection of people and the intersection. That's sort of the goal of it. Speaker 1: That's what we've been asked to build it as not as uh, an institute in the traditional sense of there are folks inside and outside, but in the sense of a meeting point and a crossing site for folks across campus. That's [00:11:30] something that's been put in front of us by the two foundations who have invested in a significant sum of money in us. That's the Gordon and Betty Moore Foundation and the Alfred p Sloan Foundation. And it's also become an inspiring vision for those of us who have been engaged in the process over the last year and a half of envisioning what it might be. It's an attempt to address the doing of data science as an intersectional area within a research university that has existing structures [00:12:00] and silos and boundaries within it. Speaker 4: And to some extent you try to deconstruct the silos and leverage the work done by one group, share it with another, you know, the concrete mechanisms are things that we're still very much working on it and we will see how it unfolds. There's even a physical element that reflects this idea of being at a crossroads, which is that the university was willing to commit to [inaudible] the physical space of one room in the main doe library, which is not only physically [00:12:30] at the center of the university and that is very important because it does mean that it is quite literally at the crossroads. It is one central point where many of us walk by frequently, so it's a space that is inviting in that sense too to encounters, to stopping by to having easy collaboration rather than being in some far edge corner of the campus. Speaker 4: But also intellectually the library is traditionally the store of the cultural and scientific memory of an institution. And so building this space in the library is a way of signaling [00:13:00] to our community that it is meant to be a point of encounter and how specifically those encounters will be embodied and what concrete mechanisms of sharing tools, sharing coach, showing data, having lecture series, having joint projects. We're in the process of imagining all of that and we're absolutely certain that we'll make some mistakes along the way, but that is very much the intent is to have something which is by design about as openly and as explicitly collaborative as we can make it and I think [00:13:30] in that sense we are picking up on many of the lessons that Catherine and her team at the d lab have already learned because the d lab has been in operation here in Barrows Hall for about a year and has already done many things in that direction and that at least I personally see them as things in the spirit of what bids is attempting to do at the scale of the entire institution. D Lab has been kind of blazing that trail already for the last year in the context of the social sciences and to the point where their impact has actually spread beyond the social sciences because so many of the things that they were doing or were [00:14:00] found to have very thirsty customers for the particular brand of lemonade that they were selling here at the lab. And their impact has already spread beyond the social sciences. But we hope to take a lot of these lessons and build them with a broader scope. Speaker 1: And in the same way BYD sits at the center of other existing organizations, entities, programs on campus, which are also deeply engaged in data science. And some of them are research centers, others of them are the data science masters program in the School of information where [00:14:30] there is a strong and deliberate attempt to think through how in a intelligent way to train people for outside the university doing data science. So all of these centers of excellence on campus have the potential to get networked in, in a much more synergistic way with the existence of bids with is not encompassing by any means. All of the great work that's getting done in teaching research around data science on this campus Speaker 6: [00:15:00] spectrum is a public affairs show on k a l x Berkeley. Our guests are Cathryn Carson and Fernando Perez. In the next segment they talk about challenges in Berkeley Institute for Data Science Phase Speaker 2: [inaudible]Speaker 3: and it seems that that eScience does happen best in teams and multidisciplinary [00:15:30] teams or is that not really the case? Speaker 1: I think we've been working on that assumption in part because it seems too much to ask of any individual to do all the things at once. At the same time, we do have many specimens of individuals who cross the boundaries of the three areas that Fernando was sketching out as domain area expertise, hacking skills and methodological competence. [00:16:00] And it's interesting to think through the intersectional individuals as well. But that said, the default assumption I think is going to have to be that teamwork collaboration and actually all of the social engineering to make that possible is going to be necessary for data science to flourish. And again, that's one of the challenges of working in a research university setting where teamwork is sometimes prized and sometimes deprecated. Speaker 4: That goes back to the incentive people building tools don't necessarily get much attention, [00:16:30] prestige from that. How do you defeat that on an institutional level within the institute or just the community? Ask us in five years if we had any success. That's one of the central challenges that we have and it's not only here at Berkeley, this is actually, there's kind of an ongoing worldwide conversation happening about this every few days. There's another article where this issue keeps being brought up again and again and it's raising in volume. The business of creating tools is becoming actually an increasing [00:17:00] part of the job of people doing science. And so for example, even young faculty who are on the tenure track are finding themselves kind of pushed against the wall because they're finding themselves writing a lot of tools and building a lot of software and having to do it collaboratively and having to engage others and picking up all of these skills and this being an important central part of their work. Speaker 4: But they feel that if their tenure committee is only going to look at their publication record and [00:17:30] 80% of their actual time went into building these things, they are effectively being shortchanged for their effort. And this is a difficult conversation. What are we going to do about it? We have a bunch of ideas. We are going to try many things. I think it's a conversation that has to happen at many levels. Some agencies are beginning, the NSF recently changed the terms of its biosketch requirements for example. And now the section that used to be called relevant publications is called relevant publications and other research outcomes. And in parentheses they explained such as software [00:18:00] projects, et cetera. So this is beginning to change the community that cure rates. For example, large data sets. That's a community that has very similar concerns. It turns out that working on a rich and complex data set may be a Labor that requires years of intensive work and that'd be maybe for a full time endeavor for someone. Speaker 4: And yet those people may end up actually getting little credit for it because maybe they weren't the ones who did use that data set to answer a specific question. But if they're left in the dust, no one will do that job. Right. And so [00:18:30] we need to acknowledge that these tasks are actually becoming a central part of the intellectual effort of research. And maybe one point that is worth mentioning in this context of incentives and careers is that we as the institution of academic science in a broad sense, are facing the challenge today that these career paths and these kinds of intersectional problems and data science are right now extremely highly valued by industry. [00:19:00] What we're seeing today with this problem is genuinely of a different scale and different enough to merit attention and consideration in its own right. Because what's happening is the people who have this intersection of skills and talents and competencies are extraordinarily well regarded by the industry right now, especially here in the bay area. Speaker 4: I know the companies that are trying to hire and I know that people were going there and the good ones can effectively name their price if they can name their price to go into contexts that are not [00:19:30] boring. A lot of the problems that industry has right now with data are actually genuinely interesting problems and they often have datasets that we in academia actually have no access to because it turns out that these days the amount of data that is being generated by web activity, by Apps, by personal devices that create an upload data is actually spectacular. And some of those data sets are really rich and complex and material for interesting work. And Industry also has the resources, the computational resources, the backend, the engineering expertise [00:20:00] to do interesting work on those problems. And so we as an academic institution are facing the challenge that we are making it very difficult for these people to find a space at the university. Yet they are critical to the success of modern data driven research and discovery and yet across the street they are being courted by an industry that isn't just offering them money to do boring work. It's actually offering them respect, yes, compensation, but also respect and intellectual space and a community that values their work and that's something [00:20:30] that is genuinely an issue for us to consider. Speaker 4: Is there a way to cross pollinate between the academic side and industry and work together on building a toolkit? Absolutely. We've had great success in that regard in the last decade with the space that I'm most embedded in, which is the space of open source scientific computing tools in python. We have a licensing model for most of the tools in our space that [00:21:00] is open source but allows for a very easy industry we use and what we find is that that has enabled a very healthy two way dialogue between industry and academia in this context. Yes, industry users, our tools, and they often use them in a proprietary context, but they use them for their own problems and for building their own domain specific products and whatever, but when they want to contribute to the base tool, the base layer if you will, it's much [00:21:30] easier for them. Speaker 4: They simply make the improvements out in the open or they just donate resources. They donate money. Microsoft research last year made $100,000 donation to the python project, which was strictly a donation. This was not a grant to develop any specific feature. This was a blanket, hey, we use your tools and they help what we build and so we would like to support you and we've had a very productive relationship with them in the past, but it's by, not by no means the only one you're at Berkeley. The amp lab was two co-directors are actually part of the team [00:22:00] that is working on bids, a young story and Mike Franklin, the AMPLab has a very large set of tools for data analytics at scale that is now widely used at Twitter and Facebook and many other places. They have industry oriented conferences around their tools. Now they have an annual conference they run twice per year. Large bootcamps, large fractions of their attendees come from industry because industry is using all of these tools and the am Platt has currently more of its funding [00:22:30] comes from industry than it comes from sources like the NSF. And so I think there are, there are actually very, very clear and unambiguous examples of models where the open source work that is coming out of our research universities can have a highly productive and valuable dialogue with the industry. Speaker 3: It seems like long term he would have a real uphill battle to create enough competent people with data trained to [00:23:00] quench both industry and academia so that there would be a, a calming of the flow out of academia. Speaker 4: As we've said a couple of times in our discussions, this is a problem. Uh, it's a very, very challenging set of problems that we've signed up for it, but we feel that it's a problem worth failing on in the sense that we, we know the challenges is, is a steep one. But at the same time, the questions are important enough to be worth making the effort. Speaker 6: [inaudible] [00:23:30] don't miss part two of this interview in two weeks and on the next edition of spectrum spectrum shows are archived on iTunes university. We've created a simple link for the link is tiny url.com/kalx specter. Now, if you're the science and technology events happen, Speaker 3: I mean locally over the next two weeks, [00:24:00] enabling a sustainable energy infrastructure is the title of David Color's presentation. On Wednesday, April 9th David Color is the faculty director of [inaudible] for Energy and the chair of computer science at UC Berkeley. He was selected in scientific American top 50 researchers and technology review 10 technologies that will change the world. His research addresses networks of small embedded wireless devices, planetary scale Internet services, parallel computer architecture, [00:24:30] parallel programming languages, and high-performance communications. This event is free and will be held in Satara Dye Hall Beneteau Auditorium. Wednesday, April 9th at noon. Cal Day is April 12th 8:00 AM to 6:00 PM 357 events for details. Go to the website, cal day.berkeley.edu a lunar eclipse Monday April 14th at 11:00 PM [00:25:00] look through astronomical telescopes at the Lawrence Hall of science to observe the first total lunar eclipse for the bay area since 2011 this is for the night owls among us UC students, staff and faculty are admitted. Speaker 3: Free. General admissions is $10 drought and deluge how applied hydro informatics are becoming standard operating data for all Californians is the title of Joshua Vere's presentation. On Wednesday, [00:25:30] April 16th Joshua veers joined the citrus leadership as the director at UC Merced said in August, 2013 prior to this, Dr Veers has been serving in a research capacity at UC Davis for 10 years since receiving his phd in ecology. This event is free and will be held in Soutar Dye Hall and Beneteau Auditorium Wednesday, April 16th at noon. A feature of spectrum is to present news stories we find interesting here are to. [00:26:00] This story relates to today's interview on big data. On Tuesday, April 1st a workshop titled Big Data Values and governance was held at UC Berkeley. The workshop was hosted by the White House Office of Science and Technology Policy, the UC Berkeley School of Information and the Berkeley Center for law and technology. The day long workshop examined policy and governance questions raised by the use of large and complex data sets and sophisticated analytics to [00:26:30] fuel decision making across all sectors of the economy, academia and government for panels. Speaker 3: Each an hour and a half long framed the issues of values and governance. A webcast. This workshop will be available from the ice school webpage by today or early next week. That's ice school.berkeley.edu vast gene expression map yields neurological and environmental stress insights. Dan Kraits [00:27:00] writing for the Lawrence Berkeley Lab News Center reports a consortium of scientists led by Susan Cell Knicker of Berkeley's labs. Life Sciences Division has conducted the largest survey yet of how information and code it in an animal genome is processed in different organs, stages of development and environmental conditions. Their findings paint a new picture of how genes function in the nervous system and in response to environmental stress. The scientists [00:27:30] studied the fruit fly, an important model organism in genetics research in all organisms. The information encoded in genomes is transcribed into RNA molecules that are either translated into proteins or utilized to perform functions in the cell. The collection of RNA molecules expressed in a cell is known as its transcriptome, which can be thought of as the readout of the genome. Speaker 3: While the genome is essentially [00:28:00] the same in every cell in our bodies, the transcriptome is different in each cell type and consistently changing cells in cardiac tissue are radically different from those in the gut or the brain. For example, Ben Brown of Berkeley Labs said, our study indicates that the total information output of an animal transcriptome is heavily weighted by the needs of the developing nervous system. The scientists also discovered a much broader [00:28:30] response to stress than previously recognized exposure to heavy metals like cadmium resulted in the activation of known stress response pathways that prevent damage to DNA and proteins. It also revealed several new genes of completely unknown function. Speaker 7: You can [inaudible]. Hmm. Speaker 3: The music or during the show [00:29:00] was [inaudible] Speaker 5: produced by Alex Simon. Today's interview with [inaudible] Rao about the show. Please send them to us spectrum [00:29:30] dot kalx@yahoo.com same time. [inaudible]. See acast.com/privacy for privacy and opt-out information.
Cathryn Carson is an Assoc Prof of History, and the Ops Lead of the Social Sciences D- Lab at UC Berkeley. Fernando Perez is a research scientist at the Henry H. Wheeler Jr. Brain Imaging Center at U.C. Berkeley. Berkeley Institute for Data Science.TranscriptSpeaker 1: Spectrum's next. Speaker 2: Okay. [inaudible] [inaudible]. Speaker 1: Welcome to spectrum the science [00:00:30] and technology show on k a l x Berkeley, a biweekly 30 minute program bringing you interviews featuring bay area scientists and technologists as well as a calendar of local events and news. Speaker 3: Hi, good afternoon. My name is Brad Swift. I'm the host of today's show this week on spectrum we present part one of our two part series on big data at cal. The Berkeley Institute for Data Science or bids is only [00:01:00] four months old. Two people involved with shaping the institute are Catherine Carson and Fernando Perez and they are our guests. Catherine Carson is an associate professor of history and associate dean of social sciences and the operational lead of the social sciences data lab at UC Berkeley. Fernando Perez is a research scientist at the Henry H. Wheeler Jr Brain imaging center at UC Berkeley. He created the ipython project while a graduate student in 2001 [00:01:30] and continues to lead the project here is part one, Catherine Carson and Fernando Perez. Welcome to spectrum. Thanks for having us and I wanted to get from both of you a little bit of a short summary about the work you're doing now that you just sort of your activity that predates your interest in data science. Speaker 4: Data Science is kind of an Ale defined term I think and it's still an open question precisely what it is, but in a certain sense all of my research has been probably under the umbrella [00:02:00] of what we call today data science since the start. I did my phd in particle physics but it was computational in particle physics and I was doing data analysis in that case of models that were competitionally created. So I've sort of been doing this really since I was a graduate student. What has changed over time is the breadth of disciplines that are interested in these kinds of problems in these kinds of tools and that have these kinds of questions. In physics. This has been kind of a common way of working on writing for a long time. Sort of the deep intersection [00:02:30] between computational tools and large data sets, whether they were created by models or collected experimentally is something that has a long history in physics. Speaker 4: How long the first computers were created to solve differential equations, to plot the trajectories of ballistic missiles. I was one of the very first tasks that's computers were created for so almost since the dawn of coats and so it's really only recently though that the size of the data sets has really jumped. Yes, the size has grown very, [00:03:00] very large in the last couple of decades, especially in the last decade, but I think it's important to not get too hung up on the issue of size because I think when we talk about data science, I like to define it rather in the context of data that is large for the traditional framework tools and conceptual kind of structure of a given discipline rather than it's raw absolute size because yes, in physics for example, we have some of the largest data sets in existence, things like what the LHC creates [00:03:30] for the Higgs Boson. Those data sets are just absolute, absurdly large, but in a given discipline, five megabytes of data might be a lot depending on what it is that you're trying to ask. And so I think it's more, it's much, much more important to think of data that has grown larger than a given discipline was used in manipulating and that therefore poses interesting challenges for that given domain rather than being completely focused on the raw size of the data. Speaker 1: I approached this from an angle that's actually complimentary to Fernando in part because [00:04:00] my job as the interim director of the social sciences data laboratory is not to do data science but to provide the infrastructure, the setting for researchers across the social sciences here who are doing that for themselves. And exactly in the social sciences you see a nice exemplification of the challenge of larger sizes of data than were previously used and new kinds of data as well. So the social sciences are starting to pick up say on [00:04:30] sensor data that has been placed in environmental settings in order to monitor human behavior. And social scientists can then use that in order to design tests around it or to develop ways of interpreting it to answer research questions that are not necessarily anticipated by the folks who put the sensors in place or accessing data that comes out of human interactions online, which is created for entirely different purposes [00:05:00] but makes it possible for social scientists to understand things about human social networks. Speaker 1: So the challenges of building capacity for disciplines to move into new scales of data sets and new kinds of data sets. So one of the ones that I've been seeing as I've been building up d lab and that we've jointly been seeing as we tried to help scope out what the task of the Berkeley Institute for data science is going to be. How about the emergence [00:05:30] of data science? Do you have a sense of the timeline when you started to take note of its feasibility for social sciences? Irrespective of physics, which has a longer history. One of the places that's been driving the conversations in social sciences, actually the funding regime in that the existing beautifully curated data sets that we have from the post World War Two period survey data, principally administrative data on top of that, [00:06:00] those are extremely expensive to produce and to curate and maintain. Speaker 1: And as the social sciences in the last only five to 10 years have been weighing the portfolio of data sources that are supported by funding agencies. We've been forced to confront the fact that the maintenance of the post World War Two regime of surveying may not be feasible into the future and that we're going to have to be shifting to other kinds of data that are generated [00:06:30] for other purposes and repurposing and reusing it, finding new ways to, to cut it and slice it in order to answer new kinds of questions that weren't also accessible to the old surveys. So one way to approach it is through the infrastructure that's needed to generate the data that we're looking at. Another way is simply to look at the infrastructure on campus. One of the launching impetuses for the social sciences data laboratory was in fact the budget cuts of 2009 [00:07:00] here on campus. When we acknowledged that if we were going to support cutting edge methodologically innovative social science on this campus, that we were going to need to find ways to repurpose existing assets and redirect them towards whatever this new frontier in social science is going to be. Speaker 5: You were listening to spectrum on k a l x Berkeley, Catherine Carson and Fernando Perez, our guests. [00:07:30] They are part of the Berkeley Institute for data science known as big [inaudible]. Speaker 4: Fernando, you sort of gave us a generalized definition of data science. Do you want to give it another go just in case you evoke something else? Sure. I want to leave that question slightly on answer because I feel that to some extent, one of the challenges we have as an intellectual effort that we're trying to tackle at the Brooklyn [00:08:00] instead for data science is precisely working on what this field is. Right. I don't want to presuppose that we have a final answer on this question, but at least we, we do know that we have some elements to frame the question and I think it's mostly about an intersection. It's about an intersection of things that were being done already on their own, but that were being done often in isolation. So it's the intersection of methodological work whereby that, I mean things like statistical theory, applied mathematics, computer science, [00:08:30] algorithm development, all of the computational and theoretical mathematical machinery that has been done traditionally, the questions arising from domain disciplines that may have models that may have data sets, that may have sensors that may have a telescope or that may have a gene sequencing array and where are they have their own theoretical models of their organisms or galaxies or whatever it is and where that data can be inscribed and the fact that tools need to be built. Speaker 4: Does data doesn't get analyzed by blackboards? Those data gets analyzed by software, but this is software that is deeply woven [00:09:00] into the fabric of these other two spaces, right? It's software that has to be written with the knowledge of the questions and the discipline and the domain and also with the knowledge of the methodology, the theory. It's that intersection of this triad of things of concrete representation in computational machinery, abstract ideas and methodologies and domain questions that in many ways creates something new when the work has to be done simultaneously with enough depth and enough rigor on all [00:09:30] of these three directions and precisely that intersection is where now the bottleneck is proving to be because you can have the ideas, you can have the questions, you can have the data, you can have the the fear m's, but if you can't put it all together into working concrete tools that you can use efficiently and with a reasonably rapid turnaround, you will not be able to move forward. You will not be able to answer the questions you want to answer about your given discipline and so that embodiment of that intersection is I think where the challenge is opposed. Maybe there is something new called [00:10:00] data science. I'd actually like to suggest that Speaker 1: the indefinable character of data science is actually not a negative because it's an intersection in a way that we're all still very much struggling. How to define it. I won't underplay that exactly in that it's an intersection. It points to the fact that it's not an intellectual thing that we're trying to get our heads around. It's a platform for activity for doing kinds of research that are either enabled or hindered by the [00:10:30] existing institutional and social structures that the research is getting done in, and so if you think of it less as a kind of concept or an intellectual construct and more of a space where people come together, either a physical space or a methodological sharing space, you realize that the indefinable ness is a way of inviting people in rather than drawing clear boundaries around it and saying, we know what this is. It is x and not Speaker 4: why [00:11:00] Berkeley Institute for data science is that where it comes in this invitation, this collection of people and the intersection. That's sort of the goal of it. Speaker 1: That's what we've been asked to build it as not as uh, an institute in the traditional sense of there are folks inside and outside, but in the sense of a meeting point and a crossing site for folks across campus. That's [00:11:30] something that's been put in front of us by the two foundations who have invested in a significant sum of money in us. That's the Gordon and Betty Moore Foundation and the Alfred p Sloan Foundation. And it's also become an inspiring vision for those of us who have been engaged in the process over the last year and a half of envisioning what it might be. It's an attempt to address the doing of data science as an intersectional area within a research university that has existing structures [00:12:00] and silos and boundaries within it. Speaker 4: And to some extent you try to deconstruct the silos and leverage the work done by one group, share it with another, you know, the concrete mechanisms are things that we're still very much working on it and we will see how it unfolds. There's even a physical element that reflects this idea of being at a crossroads, which is that the university was willing to commit to [inaudible] the physical space of one room in the main doe library, which is not only physically [00:12:30] at the center of the university and that is very important because it does mean that it is quite literally at the crossroads. It is one central point where many of us walk by frequently, so it's a space that is inviting in that sense too to encounters, to stopping by to having easy collaboration rather than being in some far edge corner of the campus. Speaker 4: But also intellectually the library is traditionally the store of the cultural and scientific memory of an institution. And so building this space in the library is a way of signaling [00:13:00] to our community that it is meant to be a point of encounter and how specifically those encounters will be embodied and what concrete mechanisms of sharing tools, sharing coach, showing data, having lecture series, having joint projects. We're in the process of imagining all of that and we're absolutely certain that we'll make some mistakes along the way, but that is very much the intent is to have something which is by design about as openly and as explicitly collaborative as we can make it and I think [00:13:30] in that sense we are picking up on many of the lessons that Catherine and her team at the d lab have already learned because the d lab has been in operation here in Barrows Hall for about a year and has already done many things in that direction and that at least I personally see them as things in the spirit of what bids is attempting to do at the scale of the entire institution. D Lab has been kind of blazing that trail already for the last year in the context of the social sciences and to the point where their impact has actually spread beyond the social sciences because so many of the things that they were doing or were [00:14:00] found to have very thirsty customers for the particular brand of lemonade that they were selling here at the lab. And their impact has already spread beyond the social sciences. But we hope to take a lot of these lessons and build them with a broader scope. Speaker 1: And in the same way BYD sits at the center of other existing organizations, entities, programs on campus, which are also deeply engaged in data science. And some of them are research centers, others of them are the data science masters program in the School of information where [00:14:30] there is a strong and deliberate attempt to think through how in a intelligent way to train people for outside the university doing data science. So all of these centers of excellence on campus have the potential to get networked in, in a much more synergistic way with the existence of bids with is not encompassing by any means. All of the great work that's getting done in teaching research around data science on this campus Speaker 6: [00:15:00] spectrum is a public affairs show on k a l x Berkeley. Our guests are Cathryn Carson and Fernando Perez. In the next segment they talk about challenges in Berkeley Institute for Data Science Phase Speaker 2: [inaudible]Speaker 3: and it seems that that eScience does happen best in teams and multidisciplinary [00:15:30] teams or is that not really the case? Speaker 1: I think we've been working on that assumption in part because it seems too much to ask of any individual to do all the things at once. At the same time, we do have many specimens of individuals who cross the boundaries of the three areas that Fernando was sketching out as domain area expertise, hacking skills and methodological competence. [00:16:00] And it's interesting to think through the intersectional individuals as well. But that said, the default assumption I think is going to have to be that teamwork collaboration and actually all of the social engineering to make that possible is going to be necessary for data science to flourish. And again, that's one of the challenges of working in a research university setting where teamwork is sometimes prized and sometimes deprecated. Speaker 4: That goes back to the incentive people building tools don't necessarily get much attention, [00:16:30] prestige from that. How do you defeat that on an institutional level within the institute or just the community? Ask us in five years if we had any success. That's one of the central challenges that we have and it's not only here at Berkeley, this is actually, there's kind of an ongoing worldwide conversation happening about this every few days. There's another article where this issue keeps being brought up again and again and it's raising in volume. The business of creating tools is becoming actually an increasing [00:17:00] part of the job of people doing science. And so for example, even young faculty who are on the tenure track are finding themselves kind of pushed against the wall because they're finding themselves writing a lot of tools and building a lot of software and having to do it collaboratively and having to engage others and picking up all of these skills and this being an important central part of their work. Speaker 4: But they feel that if their tenure committee is only going to look at their publication record and [00:17:30] 80% of their actual time went into building these things, they are effectively being shortchanged for their effort. And this is a difficult conversation. What are we going to do about it? We have a bunch of ideas. We are going to try many things. I think it's a conversation that has to happen at many levels. Some agencies are beginning, the NSF recently changed the terms of its biosketch requirements for example. And now the section that used to be called relevant publications is called relevant publications and other research outcomes. And in parentheses they explained such as software [00:18:00] projects, et cetera. So this is beginning to change the community that cure rates. For example, large data sets. That's a community that has very similar concerns. It turns out that working on a rich and complex data set may be a Labor that requires years of intensive work and that'd be maybe for a full time endeavor for someone. Speaker 4: And yet those people may end up actually getting little credit for it because maybe they weren't the ones who did use that data set to answer a specific question. But if they're left in the dust, no one will do that job. Right. And so [00:18:30] we need to acknowledge that these tasks are actually becoming a central part of the intellectual effort of research. And maybe one point that is worth mentioning in this context of incentives and careers is that we as the institution of academic science in a broad sense, are facing the challenge today that these career paths and these kinds of intersectional problems and data science are right now extremely highly valued by industry. [00:19:00] What we're seeing today with this problem is genuinely of a different scale and different enough to merit attention and consideration in its own right. Because what's happening is the people who have this intersection of skills and talents and competencies are extraordinarily well regarded by the industry right now, especially here in the bay area. Speaker 4: I know the companies that are trying to hire and I know that people were going there and the good ones can effectively name their price if they can name their price to go into contexts that are not [00:19:30] boring. A lot of the problems that industry has right now with data are actually genuinely interesting problems and they often have datasets that we in academia actually have no access to because it turns out that these days the amount of data that is being generated by web activity, by Apps, by personal devices that create an upload data is actually spectacular. And some of those data sets are really rich and complex and material for interesting work. And Industry also has the resources, the computational resources, the backend, the engineering expertise [00:20:00] to do interesting work on those problems. And so we as an academic institution are facing the challenge that we are making it very difficult for these people to find a space at the university. Yet they are critical to the success of modern data driven research and discovery and yet across the street they are being courted by an industry that isn't just offering them money to do boring work. It's actually offering them respect, yes, compensation, but also respect and intellectual space and a community that values their work and that's something [00:20:30] that is genuinely an issue for us to consider. Speaker 4: Is there a way to cross pollinate between the academic side and industry and work together on building a toolkit? Absolutely. We've had great success in that regard in the last decade with the space that I'm most embedded in, which is the space of open source scientific computing tools in python. We have a licensing model for most of the tools in our space that [00:21:00] is open source but allows for a very easy industry we use and what we find is that that has enabled a very healthy two way dialogue between industry and academia in this context. Yes, industry users, our tools, and they often use them in a proprietary context, but they use them for their own problems and for building their own domain specific products and whatever, but when they want to contribute to the base tool, the base layer if you will, it's much [00:21:30] easier for them. Speaker 4: They simply make the improvements out in the open or they just donate resources. They donate money. Microsoft research last year made $100,000 donation to the python project, which was strictly a donation. This was not a grant to develop any specific feature. This was a blanket, hey, we use your tools and they help what we build and so we would like to support you and we've had a very productive relationship with them in the past, but it's by, not by no means the only one you're at Berkeley. The amp lab was two co-directors are actually part of the team [00:22:00] that is working on bids, a young story and Mike Franklin, the AMPLab has a very large set of tools for data analytics at scale that is now widely used at Twitter and Facebook and many other places. They have industry oriented conferences around their tools. Now they have an annual conference they run twice per year. Large bootcamps, large fractions of their attendees come from industry because industry is using all of these tools and the am Platt has currently more of its funding [00:22:30] comes from industry than it comes from sources like the NSF. And so I think there are, there are actually very, very clear and unambiguous examples of models where the open source work that is coming out of our research universities can have a highly productive and valuable dialogue with the industry. Speaker 3: It seems like long term he would have a real uphill battle to create enough competent people with data trained to [00:23:00] quench both industry and academia so that there would be a, a calming of the flow out of academia. Speaker 4: As we've said a couple of times in our discussions, this is a problem. Uh, it's a very, very challenging set of problems that we've signed up for it, but we feel that it's a problem worth failing on in the sense that we, we know the challenges is, is a steep one. But at the same time, the questions are important enough to be worth making the effort. Speaker 6: [inaudible] [00:23:30] don't miss part two of this interview in two weeks and on the next edition of spectrum spectrum shows are archived on iTunes university. We've created a simple link for the link is tiny url.com/kalx specter. Now, if you're the science and technology events happen, Speaker 3: I mean locally over the next two weeks, [00:24:00] enabling a sustainable energy infrastructure is the title of David Color's presentation. On Wednesday, April 9th David Color is the faculty director of [inaudible] for Energy and the chair of computer science at UC Berkeley. He was selected in scientific American top 50 researchers and technology review 10 technologies that will change the world. His research addresses networks of small embedded wireless devices, planetary scale Internet services, parallel computer architecture, [00:24:30] parallel programming languages, and high-performance communications. This event is free and will be held in Satara Dye Hall Beneteau Auditorium. Wednesday, April 9th at noon. Cal Day is April 12th 8:00 AM to 6:00 PM 357 events for details. Go to the website, cal day.berkeley.edu a lunar eclipse Monday April 14th at 11:00 PM [00:25:00] look through astronomical telescopes at the Lawrence Hall of science to observe the first total lunar eclipse for the bay area since 2011 this is for the night owls among us UC students, staff and faculty are admitted. Speaker 3: Free. General admissions is $10 drought and deluge how applied hydro informatics are becoming standard operating data for all Californians is the title of Joshua Vere's presentation. On Wednesday, [00:25:30] April 16th Joshua veers joined the citrus leadership as the director at UC Merced said in August, 2013 prior to this, Dr Veers has been serving in a research capacity at UC Davis for 10 years since receiving his phd in ecology. This event is free and will be held in Soutar Dye Hall and Beneteau Auditorium Wednesday, April 16th at noon. A feature of spectrum is to present news stories we find interesting here are to. [00:26:00] This story relates to today's interview on big data. On Tuesday, April 1st a workshop titled Big Data Values and governance was held at UC Berkeley. The workshop was hosted by the White House Office of Science and Technology Policy, the UC Berkeley School of Information and the Berkeley Center for law and technology. The day long workshop examined policy and governance questions raised by the use of large and complex data sets and sophisticated analytics to [00:26:30] fuel decision making across all sectors of the economy, academia and government for panels. Speaker 3: Each an hour and a half long framed the issues of values and governance. A webcast. This workshop will be available from the ice school webpage by today or early next week. That's ice school.berkeley.edu vast gene expression map yields neurological and environmental stress insights. Dan Kraits [00:27:00] writing for the Lawrence Berkeley Lab News Center reports a consortium of scientists led by Susan Cell Knicker of Berkeley's labs. Life Sciences Division has conducted the largest survey yet of how information and code it in an animal genome is processed in different organs, stages of development and environmental conditions. Their findings paint a new picture of how genes function in the nervous system and in response to environmental stress. The scientists [00:27:30] studied the fruit fly, an important model organism in genetics research in all organisms. The information encoded in genomes is transcribed into RNA molecules that are either translated into proteins or utilized to perform functions in the cell. The collection of RNA molecules expressed in a cell is known as its transcriptome, which can be thought of as the readout of the genome. Speaker 3: While the genome is essentially [00:28:00] the same in every cell in our bodies, the transcriptome is different in each cell type and consistently changing cells in cardiac tissue are radically different from those in the gut or the brain. For example, Ben Brown of Berkeley Labs said, our study indicates that the total information output of an animal transcriptome is heavily weighted by the needs of the developing nervous system. The scientists also discovered a much broader [00:28:30] response to stress than previously recognized exposure to heavy metals like cadmium resulted in the activation of known stress response pathways that prevent damage to DNA and proteins. It also revealed several new genes of completely unknown function. Speaker 7: You can [inaudible]. Hmm. Speaker 3: The music or during the show [00:29:00] was [inaudible] Speaker 5: produced by Alex Simon. Today's interview with [inaudible] Rao about the show. Please send them to us spectrum [00:29:30] dot kalx@yahoo.com same time. [inaudible]. Hosted on Acast. See acast.com/privacy for more information.