Podcasts about deep neural networks

Branch of machine learning

72PODCASTS
112EPISODES
47mAVG DURATION
1MONTHLY NEW EPISODE
Nov 22, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about deep neural networks

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

5 episodes with deep neural networks

VUX World

6 episodes with deep neural networks

Machine Learning Podcast - Jay Shah

3 episodes with deep neural networks

Data Science at Home

3 episodes with deep neural networks

Mind & Matter

2 episodes with deep neural networks

PaperPlayer biorxiv neuroscience

9 episodes with deep neural networks

Investorideas -Trading & News

2 episodes with deep neural networks

The AI Eye: stock news & deal tracker

2 episodes with deep neural networks

Chai Time Data Science

2 episodes with deep neural networks

Latest podcast episodes about deep neural networks

505 Bumpers 89

Reversim Podcast

Play Episode Listen Later Nov 22, 2025

פרק מספר 505 של רברס עם פלטפורמה - באמפרס מספר 89, שהוקלט ב-13 בנובמבר 2025, רגע אחרי כנס רברסים 2025 [יש וידאו!]: רן, דותן ואלון (והופעת אורח של שלומי נוח!) באולפן הוירטואלי עם סדרה של קצרצרים מרחבי האינטרנט: הבלוגים, ה-GitHub-ים, ה-Claude-ים וה-GPT-ים החדשים מהתקופה האחרונה.

amazon ai business spoilers service state video performance devil selling trade chatgpt tesla union memory enemy production consistency software cloud experiments doom followers agent projects sci fi context ship patterns framework chat characters consistent vibe resilient powered pattern bots delay doc loop openai garbage arc user nvidia monitoring rust ux api sensitive frame dump bullet real time sweep generate open source gpt python ui aws contributors valuations risky server subscription databases github 500k conductor azure samples black box llm output prompt battlestar galactica high level extensive desktops interface spreadsheets cascade canary dns workload grok sql blueprints cloudflare dynamo guardrails rag repo kanban lambda embedding pointer rollback stop motion zed generates query retrieval rendering green tea agentic serverless neural networks mcp chromium typescript folder indexing captcha ppt vs code multicloud timeouts bloop bumpers polymarket dhh clustering cohere ec2 memori clean code code reviews dynamodb instructive langchain ragas mcps 100gb neurips us east iy deep neural networks key value garbage collector

#144 Why is Bayesian Deep Learning so Powerful, with Maurizio Filippone

Learning Bayesian Statistics

Play Episode Listen Later Oct 30, 2025 88:22 Transcription Available

Sign up for Alex's first live cohort, about Hierarchical Model building!Get 25% off "Building AI Applications for Data Scientists and Software Engineers"Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work!Visit our Patreon page to unlock exclusive Bayesian swag ;)Takeaways:Why GPs still matter: Gaussian Processes remain a go-to for function estimation, active learning, and experimental design – especially when calibrated uncertainty is non-negotiable.Scaling GP inference: Variational methods with inducing points (as in GPflow) make GPs practical on larger datasets without throwing away principled Bayes.MCMC in practice: Clever parameterizations and gradient-based samplers tighten mixing and efficiency; use MCMC when you need gold-standard posteriors.Bayesian deep learning, pragmatically: Stochastic-gradient training and approximate posteriors bring Bayesian ideas to neural networks at scale.Uncertainty that ships: Monte Carlo dropout and related tricks provide fast, usable uncertainty – even if they're approximations.Model complexity ≠ model quality: Understanding capacity, priors, and inductive bias is key to getting trustworthy predictions.Deep Gaussian Processes: Layered GPs offer flexibility for complex functions, with clear trade-offs in interpretability and compute.Generative models through a Bayesian lens: GANs and friends benefit from explicit priors and uncertainty – useful for safety and downstream decisions.Tooling that matters: Frameworks like GPflow lower the friction from idea to implementation, encouraging reproducible, well-tested modeling.Where we're headed: The future of ML is uncertainty-aware by default – integrating UQ tightly into optimization, design, and deployment.Chapters:08:44 Function Estimation and Bayesian Deep Learning10:41 Understanding Deep Gaussian Processes25:17 Choosing Between Deep GPs and Neural Networks32:01 Interpretability and Practical Tools for GPs43:52 Variational Methods in Gaussian Processes54:44 Deep Neural Networks and Bayesian Inference01:06:13 The Future of Bayesian Deep Learning01:12:28 Advice for Aspiring Researchers

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 5, 2025 85:21

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher's ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734.

ai training collapse dynamics weight watchers rag generalization charles martin deep neural networks grokking

AI Energy Measurement for Beginners

Environment Variables

Play Episode Listen Later Mar 6, 2025 56:55

Host Chris Adams is joined by Charles Tripp and Dawn Nafus to explore the complexities of measuring AI's environmental impact from a novice's starting point. They discuss their research paper, A Beginner's Guide to Power and Energy Measurement and Estimation for Computing and Machine Learning, breaking down key insights on how energy efficiency in AI systems is often misunderstood. They discuss practical strategies for optimizing energy use, the challenges of accurate measurement, and the broader implications of AI's energy demands. They also highlight initiatives like Hugging Face's Energy Score Alliance, discuss how transparency and better metrics can drive more sustainable AI development and how they both have a commonality with eagle(s)!

ai power guide energy beginners machine learning computing measurement energy efficiency responsible ai estimation chris adams energy consumption deep neural networks green software foundation

Copilots for Running Machines? GenAI Mindset and New Industrial Tech Jobs

The Optimistic Outlook

Play Episode Listen Later Feb 11, 2025 24:15

AI copilots have changed a range of professions, from healthcare to finance, by automating tasks and enhancing productivity. But can copilots also create value for people performing more mechanical, hands-on tasks or figuring out how to bring factories online? In this episode, Barbara welcomes Olympia Brikis, Director of AI Research at Siemens, to show how generative AI is shaping new industrial tech jobs at the convergence of the real and digital worlds. Olympia sheds light on the unique career opportunities in AI and what it takes to thrive in this dynamic, emerging field. Whether you're a tech enthusiast or someone curious about tech careers, this episode offers a unique perspective on how AI is reshaping the landscape of mechanical and industrial professions. Tune in to learn about the exciting innovations and the future of AI in industry! Show notes In this episode, Barbara asks Olympia to share some resources that can help all of us get smarter on industrial AI. Here are Olympia's recommendations: For everyone just getting started with (Generative) AI: Elements of AI – great for learning how AI work and what it is https://www.elementsofai.com/ Generative AI for Everyone: https://www.coursera.org/learn/generative-ai-for-everyone Co-Intelligence: Living and Working with AI, by Ethan Mollick For those want to dive deeper into the technical aspects of Deep Neural Networks and Generative AI: Deep Learning Specialization: https://www.coursera.org/specializations/deep-learning Stanford University Lecture CS336: Language Modeling from Scratch https://stanford-cs336.github.io/spring2024/

director ai mindset running tech jobs industrial machines siemens genai ai research deep neural networks

Mind & Matter

Play Episode Listen Later Jan 6, 2025 97:23

Subscriber-only episodeSend us a textShort Summary: A deep dive into the enigmatic world of sleep, exploring its biological functions, evolutionary origins, and the diverse manifestations across different species.Note: Podcast episodes are fully available to paid subscribers on the M&M Substack and on YouTube. Partial versions are available elsewhere.About the Guest: Vlad Vyazovskiy, PhD is a Professor of Sleep Physiology at the Department of Physiology, Anatomy, and Genetics at Oxford University.Key Takeaways:Sleep as a Mystery: Despite extensive research, the fundamental reason why animals sleep remains elusive, with no comprehensive theory yet agreed upon.Local Sleep Phenomenon: Sleep might not be a whole-brain event; even within a sleeping brain, different areas can be in different states of activity or rest.Sleep in Animals: Sleep varies widely among species, from micro-sleeps in penguins to unihemispheric sleep in dolphins, suggesting sleep could serve multiple, context-dependent functions.Synaptic Homeostasis: The hypothesis suggests that sleep could be crucial for renormalizing synaptic connections formed during wakefulness, although this idea is still under scrutiny.Hibernation & Torpor: These states relate to sleep but involve significant metabolic changes, possibly acting as survival mechanisms by conserving energy and reducing detectability by predators.Psychedelics & Sleep: Research shows psychedelics like 5-MeO-DMT can induce states where animals show signs of sleep in their brain activity while physically active, hinting at complex interactions between brain states and consciousness.Related episodes:M&M #43: Sleep, Dreaming, Deep Neural Networks, Machine Learning & Artificial Intelligence, Overfitted Brain Hypothesis, Evolution of Fiction & Art | Erik HoelM&M #16: Sleep, Dreams, Memory & the Brain | Bob Stickgold*Not medical advice.All episodes (audio & video), show notes, transcripts, and more at the M&M Substack Affiliates: MASA Chips—delicious tortilla chips made from organic corn and grass-fed beef tallow. No seed oils or artificial ingredients. Use code MIND for 20% off. Lumen device to optimize your metabolism for weight loss or athletic performance. Use code MIND for 10% off. Athletic Greens: Comprehensive & convenient daily nutrition. Free 1-year supply of vitamin D with purchase. KetoCitra—Ketone body BHB + potassium, calcium & magnesium, formulated with kidney health in mind. Use code MIND20 for 20% off any subscription. Learn all the ways you can support my efforts

Why Do Animals Sleep? | Vlad Vyazovskiy | 202

Mind & Matter

Play Episode Listen Later Jan 6, 2025 50:00

Send us a textShort Summary: A deep dive into the enigmatic world of sleep, exploring its biological functions, evolutionary origins, and the diverse manifestations across different species.Note: Podcast episodes are fully available to paid subscribers on the M&M Substack and on YouTube. Partial versions are available elsewhere.About the Guest: Vlad Vyazovskiy, PhD is a Professor of Sleep Physiology at the Department of Physiology, Anatomy, and Genetics at Oxford University.Key Takeaways:Sleep as a Mystery: Despite extensive research, the fundamental reason why animals sleep remains elusive, with no comprehensive theory yet agreed upon.Local Sleep Phenomenon: Sleep might not be a whole-brain event; even within a sleeping brain, different areas can be in different states of activity or rest.Sleep in Animals: Sleep varies widely among species, from micro-sleeps in penguins to unihemispheric sleep in dolphins, suggesting sleep could serve multiple, context-dependent functions.Synaptic Homeostasis: The hypothesis suggests that sleep could be crucial for renormalizing synaptic connections formed during wakefulness, although this idea is still under scrutiny.Hibernation & Torpor: These states relate to sleep but involve significant metabolic changes, possibly acting as survival mechanisms by conserving energy and reducing detectability by predators.Psychedelics & Sleep: Research shows psychedelics like 5-MeO-DMT can induce states where animals show signs of sleep in their brain activity while physically active, hinting at complex interactions between brain states and consciousness.Related episodes:M&M #43: Sleep, Dreaming, Deep Neural Networks, Machine Learning & Artificial Intelligence, Overfitted Brain Hypothesis, Evolution of Fiction & Art | Erik HoelM&M #16: Sleep, Dreams, Memory & the Brain | Bob Stickgold*Not medical advice.Support the showAll episodes (audio & video), show notes, transcripts, and more at the M&M Substack Affiliates: MASA Chips—delicious tortilla chips made from organic corn and grass-fed beef tallow. No seed oils or artificial ingredients. Use code MIND for 20% off. Lumen device to optimize your metabolism for weight loss or athletic performance. Use code MIND for 10% off. Athletic Greens: Comprehensive & convenient daily nutrition. Free 1-year supply of vitamin D with purchase. KetoCitra—Ketone body BHB + potassium, calcium & magnesium, formulated with kidney health in mind. Use code MIND20 for 20% off any subscription. Learn all the ways you can support my efforts

Sleep: Neural Circuits, Orexin/Hypocretin, Hypothalamus, Neuromodulators, Stress & Cortisol, Sleep Drugs & Ultrasound Technology | Luis de Lecea | #168

Mind & Matter

Play Episode Listen Later Aug 5, 2024 99:31 Transcription Available

Send us a Text Message.About the guest: Luis de Lecea, PhD is a neurobiologist whose lab at Stanford University studies the neural basis of sleep & wakefulness in animals.Episode summary: Nick and Dr. de Lecea discuss: the neural basis of sleep; sleep architecture & sleep phases (NREM vs. REM sleep); orexin/hypocretin neurons & the lateral hypothalamus; cortisol & stress; circadian rhythms; neuromodulators (norepinephrine, dopamine, etc); sleep across animal species; sleep drugs; ultrasound technology; and more.Related episodes:Sleep, Dreaming, Deep Neural Networks, Machine Learning & Artificial Intelligence, Overfitted Brain Hypothesis, Evolution of Fiction & Art | Erik Hoel | #43Consciousness, Anesthesia, Coma, Vegetative States, Sleep Pills (Ambien), Ketamine, AI & ChatGPT | Alex Proekt | #101*This content is never meant to serve as medical advice.Support the Show.All episodes (audio & video), show notes, transcripts, and more at the M&M Substack Try Athletic Greens: Comprehensive & convenient daily nutrition. Free 1-year supply of vitamin D with purchase.Try SiPhox Health—Affordable, at-home bloodwork w/ a comprehensive set of key health marker. Use code TRIKOMES for a 10% discount.Try the Lumen device to optimize your metabolism for weight loss or athletic performance. Use code MIND for 10% off.Learn all the ways you can support my efforts

#107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt

Learning Bayesian Statistics

Play Episode Listen Later May 29, 2024 81:37 Transcription Available

Proudly sponsored by PyMC Labs, the Bayesian Consultancy. Book a call, or get in touch!My Intuitive Bayes Online Courses1:1 Mentorship with meIn this episode, Marvin Schmitt introduces the concept of amortized Bayesian inference, where the upfront training phase of a neural network is followed by fast posterior inference.Marvin will guide us through this new concept, discussing his work in probabilistic machine learning and uncertainty quantification, using Bayesian inference with deep neural networks. He also introduces BaseFlow, a Python library for amortized Bayesian workflows, and discusses its use cases in various fields, while also touching on the concept of deep fusion and its relation to multimodal simulation-based inference.A PhD student in computer science at the University of Stuttgart, Marvin is supervised by two LBS guests you surely know — Paul Bürkner and Aki Vehtari. Marvin's research combines deep learning and statistics, to make Bayesian inference fast and trustworthy. In his free time, Marvin enjoys board games and is a passionate guitar player.Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !Thank you to my Patrons for making this episode possible!Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau, Luis Fonseca, Dante Gates, Matt Niccolls, Maksim Kuznecov, Michael Thomas, Luke Gorrie, Cory Kiser, Julio, Edvin Saveljev, Frederick Ayala, Jeffrey Powell, Gal Kampel, Adan Romero, Will Geary and Blake Walters.Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag ;)Takeaways:Amortized Bayesian inference...

Ramblings of a Designer eps. 158 - Diego Sabajo

Ramblings of a Designer podcast

Play Episode Listen Later May 10, 2024 39:12

Today we sat down with Diego and a little bit more about him: I am a Data Scientist / ML (Maching Learning) engineer, born in Suriname, South America, I also have a background in music production. Where I've built a business around it, with a global client base. I transitioned into tech first starting and working as a front end developer and then transitioned again into machine learning. Link to the Simplilearn / Caltech video: https://www.youtube.com/watch?v=Qlw1tqY4_vc&t=19s Link to my Mindkeyz music production channel: https://www.youtube.com/@Mindkeyz Ramblings of a Designer podcast is a monthly design news and discussion podcast hosted by Laszlo Lazuer and Terri Rodriguez-Hong (@flaxenink, insta: flaxenink.design). Facebook:https://www.facebook.com/Ramblings-of-a-Designer-Podcast-2347296798835079/ Send us feedback! ramblingsofadesignerpod@gmail.com, Support us on Patreon! patreon.com/ramblingsofadesigner We would love to hear from you!

learning ai career transition software designers south america machine learning ramblings suriname dnn deep neural networks

ECMO PAL: using deep neural networks for survival prediction in venoarterial extracorporeal membrane oxygenation

ESICM Talk

Play Episode Listen Later Mar 6, 2024 11:39

VA-ECMO outcome scores have been previously developed and used extensively for risk adjustment, patient prognostication, and quality control across time and centres. The limitation of such scores is the derivation by using traditional statistical methods which are not capable of covering the complexity of ECMO outcomes. The Extracorporeal Life Support Organization Member Centres have developed a study where they aimed to leverage a large international patient cohort to develop and validate an AI-driven tool for predicting in-hospital mortality of VA-ECMO. The tool was derived entirely from pre-ECMO variables, allowing for mortality prediction immediately after ECMO initiation.To learn more about this study listen to the podcast.

ai predictions survival membrane ecmo oxygenation extracorporeal deep neural networks va ecmo

ACR 2023 Podcast - Day2A

Rheumnow Podcast

Play Episode Listen Later Nov 14, 2023 33:03

An Interview with Kevin Winthrop_ Three Questions About COVID 19 Biologics in RA_ Mechanisms of Delivery Over Mechanisms of Action_ (1) Deep Neural Networks and Radiographic Progression in AxSpA Difficult to Treat axSpA Does GnRHa Reduce Premature Ovarian Insufficiency in SLE Patients on Cyclophosphamide_ Factors Associated with Discontinuation of TNFs in AxSpA From AS Patient to Rheumatologist

interview action rheumatologists discontinuation deep neural networks

Amr Awadallah, CEO of Vectara and co-founder of Cloudera, discusses the future of AI search

AI and the Future of Work

Play Episode Play 15 sec Highlight Listen Later Nov 13, 2023 36:34

Amr and I met on a genAI panel and everything he said was both insightful and contrarian. Immediately, I knew I wanted to introduce him to you. Amr is a legend in the search space who, by the way, also founded Cloudera which went public in 2017 at a valuation of over $5B.Dr. Amr Awadallah is a luminary in the world of information retrieval. He's the CEO and cofounder of Vectara, a company that is revolutionizing how we find meaning across all languages of the world using the latest advances in Deep Neural Networks, Large Language Models, and Natural Language Processing. He previously served as VP of Developer Relations for Google Cloud. Prior to joining Google in Nov 2019, Amr co-founded Cloudera in 2008 and as Global CTO. He also served as vice president of product intelligence engineering at Yahoo! from 2000-2008. Amr received his PhD in EE from Stanford University, and his Bachelor and Masters Degrees from Cairo University, Egypt.Listen and learn...How Amr discovered the power of "talking to software" via LLMs while at GoogleAbout the history of new computing modalitiesAbout the current state of generative AIThe technical explanation for hallucination in LLMsHow do we mitigate bias in LLM models and prevent copyright infringementWhy a semantic understanding of queries is the next frontier in searchThe challenge faced by search providers of making money incorporating ads into LLM-based answersHow "grounded search" will fix the hallucination problemWhat is a "fact" in the era of ChatGPT?How long before we have "antivirus sofware for fact-checking" genAI propagandaHow should AI be regulated... and who is responsible for AI regulationThe next big idea in genAI Amr and I are ready to fundAmr's advice to entrepreneurs... and to himselfReferences in this episode...Eric Olson, Consensus CEO, on AI and the Future of WorkD Das, Sorcero CEO, on AI and the Future of WorkSeth Earley, Earley Information Science, on AI and the Future of WorkChatGPT for searching scientific papers

Mitsubishi Scientist: Are Deep Neural Networks Smarter than Second Graders?

AI Business Podcast

Play Episode Listen Later Oct 30, 2023 32:48

Anoop Cherian, senior principal research scientist at Mitsubishi Electric Research Laboratories, tested the reasoning skills of large language models in a set of logic puzzles to see if they are smarter than young kids.

scientists smarter mitsubishi graders deep neural networks

FODD 11 - Abhishek Pandey - Machine Learning Blueprints for Pharma Discovery Breakthroughs

Future of Drug Discovery

Play Episode Listen Later Sep 22, 2023 27:54

In this talk, Murat speaks with Abhishek Pandey, Principle Research Scientist and Pharma Discovery Group Lead at AbbVie. Abhishek currently leads RAIDERS, a Pharma Discovery group integrating machine learning across all of AbbVie's working groups to augment drug discovery capabilities. Abhishek's groups work spans cheminformatics, imaging research, multi-omics, as well as the computation behind many of AbbVie's current partnerships and collaborations. We'll be exploring how the machine learning group is transforming the processes across multiple working groups at the pharmaceutical giant, AbbVie. We'll also be unravelling the story of how Abhishek began his pioneering work at AbbVie. You will learn: About Abhishek's strategy for bridging chemistry, biology and computer science expertise to integrate machine learning across AbbVie's pharma discovery processes. About Abhishek's vision for the future of drug discovery, as we explore the pressing needs of the pharmaceutical industry from both a technical and social standpoint. About Abhishek: Abhishek started his career as a software engineer at Toshiba. Following this, Abhishek began his PhD in medical engineering and image processing at the University of Arizona. This led Abhishek to become a core member of the precision medicine AI team at Tempus, developing ways to improve cancer imaging pipelines to help oncologists detect cancers more efficiently. Abhishek then joined AbbVie, where he founded the machine learning / deep learning team - the influence of which has expanded across the entire company. Things mentioned: Abhishek Pandey⁠, Principle Research Scientist and Pharma Discovery Group Lead at ⁠AbbVie⁠. RAIDERS: Pharma Discovery and Developmental Sciences team at AbbVie. Google Meta Toshiba Tempus Labs (now Tempus). PADME (Protein And Drug Molecule interaction prEdiction), a framework based on Deep Neural Networks, to predict real-valued interaction strength between compounds and proteins. NeurIPS Conference (Conference on Neural Information Processing Systems) InSitro Facebook Additional Materials: ⁠⁠⁠⁠⁠Watch the recording of the live event! ⁠https://www.youtube.com/watch?v=F9kADawRPOs First time hearing about Antiverse? Antiverse is an artificial intelligence-driven techbio company that specialises in antibody design against difficult-to-drug targets, including G-protein coupled receptors (GPCRs) and ion channels. Headquartered in Cardiff, UK and with offices in Boston, MA, Antiverse combines state-of-the-art machine learning techniques and advanced cell line engineering to develop de novo antibody therapeutics. With a main focus on establishing long-term partnerships, Antiverse has collaborated with two top 20 global pharmaceutical companies. In addition, they are developing a strong internal pipeline of antibodies against several challenging drug targets across various indications. For more information, please visit https://www.antiverse.io Never miss a future event! We'll notify you when there's a new event scheduled. Fill out ⁠this form⁠ to sign up for marketing communications from Antiverse. We post about upcoming events on LinkedIn and Twitter. LinkedIn: ⁠https://www.linkedin.com/company/antiverse/⁠ Twitter (x.com) : ⁠https://twitter.com/AntiverseHQ

Conversations: Eric Topol, MD

The Nocturnists

Play Episode Listen Later Aug 3, 2023 55:20

Emily speaks with cardiologist Eric Topol about his 2019 book Deep Medicine, which explores the potential for AI to enhance medical decision-making, improve patient outcomes, and restore the doctor-patient relationship. Find show notes, transcript, and more at thenocturnists.com.

193: The potency of rock-physics-guided deep neural networks

Seismic Soundoff

Play Episode Listen Later Jul 20, 2023 16:14

Fabien Allo highlights his award-winning article, "Characterization of a carbonate geothermal reservoir using rock-physics-guided deep neural networks." In this episode with host Andrew Geary, Fabien shares the potential of deep neural networks (DNNs) in integrating seismic data for reservoir characterization. He explains why DNNs have yet to be widely utilized in the energy industry and why utilizing a training set was key to this study. Fabien also details why they did not include any original wells in the final training set and the advantages of neural networks over seismic inversion. He closes with how this method of training neural networks on synthetic data might be useful beyond the application to a geothermal study. This episode is an exciting opportunity to hear directly from an award-winning author on some of today's most cutting-edge geophysics tools. Listen to the full archive at https://seg.org/podcast. RELATED LINKS * Fabien Allo, Jean-Philippe Coulon, Jean-Luc Formento, Romain Reboul, Laure Capar, Mathieu Darnet, Benoit Issautier, Stephane Marc, and Alexandre Stopin, (2021), "Characterization of a carbonate geothermal reservoir using rock-physics-guided deep neural networks," The Leading Edge 40: 751–758. - https://doi.org/10.1190/tle40100751.1 BIOGRAPHY Fabien Allo received his BSc in mathematics, physics, and chemistry with a biology option from the Lycée Chateaubriand, Rennes (France) in 2000 and his MSc and engineering degree in geology from the École Nationale Supérieure de Géologie, Nancy (France) in 2003. Since joining CGG 20 years ago, he has held several roles in the UK, Brazil, and now Canada working on inventing, designing, and developing reservoir R&D workflows for seismic forward modeling and inversion with a specific focus on data integration through rock physics. Fabien was recently promoted to the position of rock physics & reservoir expert within CGG's TECH+ Reservoir R&D team. He has increasingly applied geoscience capabilities to energy transition areas, such as carbon capture & sequestration (CCS) and geothermal projects. He received the SEG Award for Best Paper in The Leading Edge in 2021 for a CGG-BRGM co-authored paper published in October 2021: "Characterization of a carbonate geothermal reservoir using rock-physics-guided deep neural networks." (https://www.cgg.com/sites/default/files/2021-10/TLE%20Oct%202021%20Allo%20et%20al%20Final%20published.pdf) CREDITS Seismic Soundoff explores the depth and usefulness of geophysics for the scientific community and the public. If you want to be the first to know about the next episode, please follow or subscribe to the podcast wherever you listen to podcasts. Two of our favorites are Apple Podcasts and"Spotify. If you have episode ideas, feedback for the show, or want to sponsor a future episode, find the "Contact Seismic Soundoff" box at https://seg.org/podcast. Zach Bridges created original music for this show. Andrew Geary hosted, edited, and produced this episode at TreasureMint. The SEG podcast team is Jennifer Cobb, Kathy Gamble, and Ally McGinnis.

The future of human vision

The Future of Everything presented by Stanford Engineering

Play Episode Listen Later Jun 30, 2023 33:31

Neuroscientist Kalanit Grill-Spector studies the physiology of human vision and says that the ways computers and people see are in some ways similar, but in other ways quite different. In fact, she says, rapid advances in computational modeling, such as deep neural networks, applied to brain data and new imaging technologies, like quantitative MRI and diffusion MRI, are revolutionizing our understanding of how the human brain sees. We're unraveling how the brain “computes” visual information, as Grill-Spector tells host Russ Altman on this episode of Stanford Engineering's The Future of Everything podcast.Chapter Time Stamps:(00:01:30) Episode introduction: Exploring the fascinating field of cognitive neuroscience and brain development with Kalanit Grill-Spector.(00:02:45) Dr Grill-Spector's background and research interests: The intersection of cognitive neuroscience, psychology, and computer science.(00:04:00) The crucial role of experience in shaping brain development: Understanding how environmental factors influence neural specialization.(00:09:55) The development of word processing regions in the brain: Investigating the emergence and evolution of brain regions associated with reading and word recognition.(00:11:30) The evolution of word specialization and its implications: Exploring how the brain acquires the ability to read and process words.(00:14:20) Shift in research focus to studying brain development in infants: Exploring the critical early phases of brain development and the impact of experience on neural circuits.(00:16:40) Pokemon, Brain Representation, and Perception: The surprising findings on the continued development of word and face processing regions. Discovering the extended period of specialization and plasticity in these brain areas.(00:19:10) Unexpected decline in specialization for body parts, particularly hands: Examining the trade-off between different cognitive abilities as brain regions specialize.(00:22:00) Understanding the potential impact of experience on brain organization: Examining how environmental factors shape the neural pathways and cognitive capabilities.(00:25:00) Investigating the influence of Pokemon on brain representation and perception: Analyzing the effects of exposure to specific visual stimuli on brain organization.(00:27:15) The unique characteristics of Pokemon stimuli: Exploring how visual features, animacy, and stimulus size affect brain responses.(00:29:00) Specificity of brain representation for Pokemons: Uncovering whether the brain develops distinct neural pathways for Pokemon stimuli.(00:31:45) Comparing the effects of word learning: Understanding the potential trade-offs in brain specialization.(00:32:45) Technical challenges in studying infant's brains: Discussing the need for new tools and analysis methods to study developing brains.

vision future shift discovering unexpected exploring pokemon comparing technical examining analyzing mri investigating specificity deep neural networks computational modeling russ altman stanford engineering

The Importance of Energy Efficiency in AI with Vivienne Sze

CSAIL Alliances Podcasts

Play Episode Listen Later Jun 5, 2023 28:17

Vivienne Sze is an associate professor in MIT's Department of Electrical Engineering and Computer Science. She's a coauthor of Efficient Processing of Deep Neural Networks. A full transcript will be available at cap.csail.mit.edu

mit computer science electrical engineering energy efficiency deep neural networks

The Godfather of AI is Worried About AI

TechStuff

Play Episode Listen Later May 10, 2023 44:51

Dr. Geoffrey Hinton recently retired from Google, saying that he wanted to be able to speak freely about his concerns regarding artificial intelligence without having to consider the impact to his employer. So what is the Godfather of AI worried about? See omnystudio.com/listener for privacy information.

April 25th, 2023 | Apparently everyone's leaking your data

Hacker News Recap

Play Episode Listen Later Apr 26, 2023 13:25

This is a recap of the top 10 posts on Hacker News on April 25th, 2023.(00:36): Smartphones with Qualcomm chip secretly send personal data to QualcommOriginal post: https://news.ycombinator.com/item?id=35698547(01:43): The EU suppressed a 300-page study that found piracy doesn't harm salesOriginal post: https://news.ycombinator.com/item?id=35701785(02:53): Microsoft Edge is leaking the sites you visit to BingOriginal post: https://news.ycombinator.com/item?id=35703789(04:02): Use of antibiotics in farming ‘endangering human immune system'Original post: https://news.ycombinator.com/item?id=35700881(05:17): People who use Notion to plan their whole livesOriginal post: https://news.ycombinator.com/item?id=35698521(06:38): A non-technical explanation of deep learningOriginal post: https://news.ycombinator.com/item?id=35701572(07:47): Shell admits 1.5C climate goal means immediate end to fossil fuel growthOriginal post: https://news.ycombinator.com/item?id=35702201(09:09): Call on the IRS to provide libre tax-filing softwareOriginal post: https://news.ycombinator.com/item?id=35705469(10:36): Deep Neural Networks from Scratch in ZigOriginal post: https://news.ycombinator.com/item?id=35696776(11:54): NitroKey disappoints meOriginal post: https://news.ycombinator.com/item?id=35706858This is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

ai data european union smartphones irs scratch notion qualcomm leaking yc 5c microsoft edge hn hacker news deep neural networks

Factorized visual representations in the primate visual system and deep neural networks

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Apr 22, 2023

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.22.537916v1?rss=1 Authors: Lindsey, J. W., Issa, E. B. Abstract: Object classification has been proposed as a principal objective of the primate ventral visual stream. However, optimizing for object classification alone does not constrain how other variables may be encoded in high-level visual representations. Here, we studied how the latent sources of variation in a visual scene are encoded within high-dimensional population codes in primate visual cortex and in deep neural networks (DNNs). In particular, we focused on the degree to which different sources of variation are represented in non-overlapping ("factorized") subspaces of population activity. In the monkey ventral visual hierarchy, we found that factorization of object pose and background information from object identity increased in higher-level regions. To test the importance of factorization in computational models of the brain, we then conducted a detailed large-scale analysis of factorization of individual scene parameters -- lighting, background, camera viewpoint, and object pose -- in a diverse library of DNN models of the visual system. Models which best matched neural, fMRI and behavioral data from both monkeys and humans across 12 datasets tended to be those which factorized scene parameters most strongly. In contrast, invariance to object pose and camera viewpoint in models was negatively associated with a match to neural and behavioral data. Intriguingly, we found that factorization was similar in magnitude and complementary to classification performance as an indicator of the most brainlike models suggesting a new principle. Thus, we propose that factorization of visual scene information is a widely used strategy in brains and DNN models. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

llc models copy issa primates fmri representations intriguingly biorxiv dnn deep neural networks visual system

Modelling Phenomenological Differences in Aetiologically Distinct Visual Hallucinations Using Deep Neural Networks

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Feb 14, 2023

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.02.13.528288v1?rss=1 Authors: Suzuki, K., Seth, A. K., Schwartzman, D. J. Abstract: Visual hallucinations (VHs) are perceptions of objects or events in the absence of the sensory stimulation that would normally support such perceptions. Although all VHs share this core characteristic, there are substantial phenomenological differences between VHs that have different aetiologies, such as those arising from neurological conditions, visual loss, or psychedelic compounds. Here, we examine the potential mechanistic basis of these differences by leveraging recent advances in visualising the learned representations of a coupled classifier and generative deep neural network - an approach we call 'computational (neuro)phenomenology'. Examining three aetiologically distinct populations in which VHs occur - neurological conditions (Parkinson's Disease and Lewy Body Dementia), visual loss (Charles Bonnet Syndrome, CBS), and psychedelics - we identify three dimensions relevant to distinguishing these classes of VHs: realism (veridicality), dependence on sensory input (spontaneity), and complexity. By selectively tuning the parameters of the visualisation algorithm to reflect influence along each of these phenomenological dimensions we were able to generate 'synthetic VHs' that were characteristic of the VHs experienced by each aetiology. We verified the validity of this approach experimentally in two studies that examined the phenomenology of VHs in neurological and CBS patients, and in people with recent psychedelic experience. These studies confirmed the existence of phenomenological differences across these three dimensions between groups, and crucially, found that the appropriate synthetic VHs were representative of each group's hallucinatory phenomenology. Together, our findings highlight the phenomenological diversity of VHs associated with distinct causal factors and demonstrate how a neural network model of visual phenomenology can successfully capture the distinctive visual characteristics of hallucinatory experience. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

llc disease cbs differences parkinson visual examining vhs copy modelling hallucinations distinct schwartzman lewy body dementia biorxiv deep neural networks charles bonnet syndrome

Inferring Line-of-Sight Velocities and Doppler Widths from Stokes Profiles of GST NIRIS Using Stacked Deep Neural Networks

Astro arXiv | all categories

Play Episode Listen Later Oct 10, 2022 1:12

Inferring Line-of-Sight Velocities and Doppler Widths from Stokes Profiles of GST NIRIS Using Stacked Deep Neural Networks by Haodi Jiang et al. on Monday 10 October Obtaining high-quality magnetic and velocity fields through Stokes inversion is crucial in solar physics. In this paper, we present a new deep learning method, named Stacked Deep Neural Networks (SDNN), for inferring line-of-sight (LOS) velocities and Doppler widths from Stokes profiles collected by the Near InfraRed Imaging Spectropolarimeter (NIRIS) on the 1.6 m Goode Solar Telescope (GST) at the Big Bear Solar Observatory (BBSO). The training data of SDNN is prepared by a Milne-Eddington (ME) inversion code used by BBSO. We quantitatively assess SDNN, comparing its inversion results with those obtained by the ME inversion code and related machine learning (ML) algorithms such as multiple support vector regression, multilayer perceptrons and a pixel-level convolutional neural network. Major findings from our experimental study are summarized as follows. First, the SDNN-inferred LOS velocities are highly correlated to the ME-calculated ones with the Pearson product-moment correlation coefficient being close to 0.9 on average. Second, SDNN is faster, while producing smoother and cleaner LOS velocity and Doppler width maps, than the ME inversion code. Third, the maps produced by SDNN are closer to ME's maps than those from the related ML algorithms, demonstrating the better learning capability of SDNN than the ML algorithms. Finally, comparison between the inversion results of ME and SDNN based on GST/NIRIS and those from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory in flare-prolific active region NOAA 12673 is presented. We also discuss extensions of SDNN for inferring vector magnetic fields with empirical evaluation. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2210.04122v1

pearson ml profiles stokes stacked noaa doppler arxiv line of sight inferring deep neural networks velocities solar dynamics observatory

Invariance of object detection in untrained deep neural networks

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Sep 10, 2022

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.09.08.507096v1?rss=1 Authors: Cheon, J., Baek, S., Paik, S.-B. Abstract: The ability to perceive visual objects with various types of transformations, such as rotation, translation, and scaling, is crucial for consistent object recognition. In machine learning, invariant object detection for a network is often implemented by augmentation with a massive number of training images, but the mechanism of invariant object detection in biological brains - how invariance arises initially and whether it requires visual experience - remains elusive. Here, using a model neural network of the hierarchical visual pathway of the brain, we show that invariance of object detection can emerge spontaneously in the complete absence of learning. First, we found that units selective to a particular object class arise in randomly initialized networks even before visual training. Intriguingly, these units show robust tuning to images of each object class under a wide range of image transformation types, such as viewpoint rotation. We confirmed that this "innate" invariance of object selectivity enables untrained networks to perform an object-detection task robustly, even with images that have been significantly modulated. Our computational model predicts that invariant object tuning originates from combinations of non-invariant units via random feedforward projections, and we confirmed that the predicted profile of feedforward projections is observed in untrained networks. Our results suggest that invariance of object detection is an innate characteristic that can emerge spontaneously in random feedforward networks. Copy rights belong to original authors. Visit the link for more info Podcast created by PaperPlayer

copy object detection intriguingly untrained paik baek biorxiv deep neural networks invariance

Comparison of visual quantities in untrained deep neural networks

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Sep 10, 2022

Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.09.08.507097v1?rss=1 Authors: Lee, H., Choi, W., Lee, D., Paik, S.-B. Abstract: The ability to compare quantities of visual objects with two distinct measures, proportion and difference, is observed in newborn animals. Nevertheless, how this function originates in the brain, even before training, remains unknown. Here, we show that neuronal tuning for quantity comparison can arise spontaneously in completely untrained deep neural networks. Using a biologically inspired model neural network, we found that units selective to proportions and differences between visual quantities emerge in randomly initialized networks and that they enable the network to perform quantity comparison tasks. Further analysis shows that two distinct tunings to proportion and difference both originate from a random summation of monotonic, nonlinear responses to changes in relative quantities. Notably, we found that a slight difference in the nonlinearity profile determines the type of measure. Our results suggest that visual quantity comparisons are primitive types of functions that can emerge spontaneously in random feedforward networks. Copy rights belong to original authors. Visit the link for more info Podcast created by PaperPlayer

comparison visual copy notably choi untrained paik quantities biorxiv deep neural networks

שישה מאמרים שכל דאטא סיינטינסט חייב להכיר עם שקד זיכלינסקי

ExplAInable

Play Episode Listen Later Sep 4, 2022 25:43

בפרק זה אירחנו את שקד זיכלינסקי, ראש קבוצת ההמלצות של לייטריקס. שקד ריכז עבורנו את ששת המאמרים החשובים שכל דאטא סיינטיסט מודרני חייב להכיר. ששת המאמרים הם: (1) Attention Is All You Need (2) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (3) A Style-Based Generator Architecture for Generative Adversarial Networks (4) Learning Transferable Visual Models From Natural Language Supervision (5) Mastering the Game of Go with Deep Neural Networks and Tree Search (6) Deep Neural Networks for YouTube Recommendations שקד גם כתב בהרחבה במדיום פה: https://towardsdatascience.com/6-papers-every-modern-data-scientist-must-read-1d0e708becd

game mastering generative adversarial networks deep neural networks

Energy functions and shortcut learning

Underrated ML

Play Episode Listen Later Jul 26, 2022 89:04

This week we are joined by Kyunghyun Cho. He is an associate professor of computer science and data science at New York University, a research scientist at Facebook AI Research and a CIFAR Associate Fellow. On top of this he also co-chaired the recent ICLR 2020 virtual conference.We talk about a variety of topics in this weeks episode including the recent ICLR conference, energy functions, shortcut learning and the roles popularized Deep Learning research areas play in answering the question “What is Intelligence?”.Underrated ML Twitter: https://twitter.com/underrated_mlKyunghyun Cho Twitter: https://twitter.com/kchonyc?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EauthorPlease let us know who you thought presented the most underrated paper in the form below:https://forms.gle/97MgHvTkXgdB41TC8Links to the papers:“Shortcut Learning in Deep Neural Networks” - https://arxiv.org/pdf/2004.07780.pdf"Bayesian Deep Learning and a Probabilistic Perspective of Generalization” - https://arxiv.org/abs/2002.08791"Classifier-agnostic saliency map extraction" - https://arxiv.org/abs/1805.08249“Deep Energy Estimator Networks” - https://arxiv.org/abs/1805.08306“End-to-End Learning for Structured Prediction Energy Networks” - https://arxiv.org/abs/1703.05667“On approximating nabla f with neural networks” - https://arxiv.org/abs/1910.12744“Adversarial NLI: A New Benchmark for Natural Language Understanding“ - https://arxiv.org/abs/1910.14599“Learning the Difference that Makes a Difference with Counterfactually-Augmented Data” - https://arxiv.org/abs/1909.12434“Learning Concepts with Energy Functions” - https://openai.com/blog/learning-concepts-with-energy-functions/

learning energy intelligence new york university makes shortcuts functions deep learning generalization deep neural networks natural language understanding classifier iclr

DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella

Data on Kubernetes Community

Play Episode Listen Later Jul 15, 2022 60:10

https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK When providing data analysis as a service, one must tackle several problems. Data privacy and protection by design are crucial when working on sensitive data. Performance and scalability are fundamental for compute-intensive workloads, e.g. training Deep Neural Networks. User-friendly interfaces and fast prototyping tools are essential to allow domain experts to experiment with new techniques. Portability and reproducibility are necessary to assess the actual value of results. Kubernetes is the best platform to provide reliable, elastic, and maintainable services. However, Kubernetes alone is not enough to achieve large-scale multi-tenant reproducible data analysis. OOTB support for multi-tenancy is too rough, with only two levels of segregation (i.e. the single namespace or the entire cluster). Offloading computation to off-cluster resources is non-trivial and requires the user's manual configuration. Also, Jupyter Notebooks per se cannot provide much scalability (they execute locally and sequentially) and reproducibility (users can run cells in any order and any number of times). The Dossier platform allows system administrators to manage multi-tenant distributed Jupyter Notebooks at the cluster level in the Kubernetes way, i.e. through CRDs. Namespaces are aggregated in Tenants, and all security and accountability aspects are managed at that level. Each Notebook spawns into a user-dedicated namespace, subject to all Tenant-level constraints. Users can rely on provisioned resources, either in-cluster worker nodes or external resources like HPC facilities. Plus, they can plug their computing nodes in a BYOD fashion. Notebooks are interpreted as distributed workflows, where each cell is a task that one can offload to a different location in charge of its execution. BIO Iacopo Colonnelli is a Computer Science research fellow. He received his Ph.D. with honours in Modeling and Data Science at Università di Torino with a thesis on novel workflow models for heterogeneous distributed systems, and his master's degree in Computer Engineering from Politecnico di Torino with a thesis on a high-performance parallel tracking algorithm for the ALICE experiment at CERN. His research focuses on both statistical and computational aspects of data analysis at large scale and on workflow modeling and management in heterogeneous distributed architectures. Dario is an SWE that turned DevOps, and he's regretting this choice day by day. Besides making memes on Twitter that gain more reactions than technical discussions, leading the development of Open Source projects at CLASTIX, an Open Source-based start-up focusing on Multi-Tenancy in Kubernetes. KEY TAKE-AWAYS FROM THE TALK From this talk, people will learn: - The different requirements of Data analysis as a service - How to configure for multi-tenancy at the cluster level with Capsule - How to write distributed workflows as Notebooks with Jupyter Workflows - How to combine all these aspects into a single platform: Dossier All the software presented in the talk is OpenSource, so attendees can directly play with them and include them in their experiments with no additional restrictions.

Explainability, Human Aware AI & sentience in large language models | Dr. Subbarao Kambhampati

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Jun 27, 2022 144:42

Are large language models really sentient or conscious? What is explainability (XAI) and how can we create human-aware AI systems for collaborative tasks? Dr. Subbarao Kambhampati sheds some light on these topics, generating explanations for human-in-loop AI systems and understanding 'intelligence' in context to AI systems. He is a Prof of Computer Science at Arizona State University and director of the Yochan lab at ASU where his research focuses on decision-making and planning specifically in the context of human-aware AI systems. He has received multiple awards for his research contributions. He has also been named a fellow of AAAI, AAAS, and ACM and also a distinguished alumnus from the University of Maryland and also recently IIT Madras.Time stamps of conversations:00:00:40 Introduction00:01:32 What got you interested in AI?00:07:40 Definition of intelligence that is not related to human intelligence00:13:40 Sentience vs intelligence in modern AI systems00:24:06 Human aware AI systems for better collaboration00:31:25 Modern AI becoming natural science instead of an engineering task00:37:35 Understanding symbolic concepts to generate accurate explanations00:56:45 Need for explainability and where01:13:00 What motivates you for research, the application associated or theoretical pursuit?01:18:47 Research in academia vs industry01:24:38 DALL-E performance and critiques01:45:40 What makes for a good research thesis? 01:59:06 Different trajectories of a good CS PhD student02:03:42 Focusing on measures vs metrics 02:15:23 Advice to students on getting started with AIArticles referred in the conversationAI as Natural Science?: https://cacm.acm.org/blogs/blog-cacm/261732-ai-as-an-ersatz-natural-science/fulltextPolanyi's Revenge and AI's New Romance with Tacit Knowledge: https://cacm.acm.org/magazines/2021/2/250077-polanyis-revenge-and-ais-new-romance-with-tacit-knowledge/fulltextMore about Prof. RaoHomepage: https://rakaposhi.eas.asu.edu/Twitter: https://twitter.com/rao2zAbout the Host:Jay is a PhD student at Arizona State University.Linkedin: https://www.linkedin.com/in/shahjay22/Twitter: https://twitter.com/jaygshah22Homepage: https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Xenobots and Critical Learning Periods

Underrated ML

Play Episode Listen Later Jun 19, 2022 65:26

Have a listen to the first ever Underrated ML podcast! We'll walk you through two papers which we found really interesting followed by a few questions and then finally finishing with our verdict on what we believe was the most underrated paper!Links to the papers can be found below.Critical Learning Periods in Deep Neural Networks - https://arxiv.org/abs/1711.08856A scalable pipeline for designing reconfigurable organisms - https://www.pnas.org/content/117/4/1853

learning periods xenobots deep neural networks

Enhancing AI Diagnosis

The Discovery Files

Play Episode Listen Later May 12, 2022 1:30

Episode 10: Deep Neural Networks in Julia with Flux.jl

Talk Julia

Play Episode Listen Later Mar 5, 2022 36:57

David and Randy explore deep neural networks in Julia using Flux.jl by recreating Grant Sanderson's model for predicting handwritten digits in the MNIST data set. We also show how to visualize model results and training performance in TensorBoard using the TensorBoardLogging.jl package.

machine learning flux deep learning neural networks deep neural networks mnist grant sanderson david amos

Season 2 Episode 16: Grace Lindsay - Computational neuroscience and her book, "Models of the Mind"

OHBM Neurosalience

Play Episode Listen Later Feb 11, 2022 66:47

In this episode Dr Peter Bandettini and co-host Dr Brendan Ritchie interview Dr Grace Lindsay. They find out about her new book 'Models of the mind' and about the process of writing a book. In doing so, they consider different types of brain models, from simply descriptive to more mechanistic, from too simple to overfitted. They describe the challenge in neuroscience of network modelling - the many unknowns and limited data and how output of the model may help inform its accuracy. They then discuss specific models, such as Deep Neural Networks, and how this type of modelling may progress in the future. Last, Lindsay gives some thoughts about the future hopes, philosophies, and strategies of modelling - how doing it well is both an art and a science.

models computational neuroscience deep neural networks

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase

The Nonlinear Library: LessWrong Top Posts

Play Episode Listen Later Dec 11, 2021 175:53

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers, published by lifelonglearner, Peter Hase on the AI Alignment Forum. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Peter Hase. UNC Chapel Hill. Owen Shen. UC San Diego. With thanks to Robert Kirk and Mohit Bansal for helpful feedback on this post. Introduction. Model interpretability was a bullet point in Concrete Problems in AI Safety (2016). Since then, interpretability has come to comprise entire research directions in technical safety agendas (2020); model transparency appears throughout An overview of 11 proposals for building safe advanced AI (2020); and explainable AI has a Twitter hashtag, #XAI. (For more on how interpretability is relevant to AI safety, see here or here.) Interpretability is now a very popular area of research. The interpretability area was the most popular in terms of video views at ACL last year. Model interpretability is now so mainstream there are books on the topic and corporate services promising it. So what's the state of research on this topic? What does progress in interpretability look like, and are we making progress? What is this post? This post summarizes 70 recent papers on model transparency, interpretability, and explainability, limited to a non-random subset of papers from the past 3 years or so. We also give opinions on several active areas of research, and collate another 90 papers that are not summarized. How to read this post. If you want to see high-level opinions on several areas of interpretability research, just read the opinion section, which is organized according to our very ad-hoc set of topic areas. If you want to learn more about what work looks like in a particular area, you can read the summaries of papers in that area. For a quick glance at each area, we highlight one standout paper per area, so you can just check out that summary. If you want to see more work that has come out in an area, look at the non-summarized papers at the end of the post (organized with the same areas as the summarized papers). We assume readers are familiar with basic aspects of interpretability research, i.e. the kinds of concepts in The Mythos of Model Interpretability and Towards A Rigorous Science of Interpretable Machine Learning. We recommend looking at either of these papers if you want a primer on interpretability. We also assume that readers are familiar with older, foundational works like "Why Should I Trust You?: Explaining the Predictions of Any Classifier." Disclaimer: This post is written by a team of two people, and hence its breadth is limited and its content biased by our interests and backgrounds. A few of the summarized papers are our own. Please let us know if you think we've missed anything important that could improve the post. Master List of Summarized Papers. Theory and Opinion. Explanation in Artificial Intelligence: Insights from the Social Sciences. Chris Olah's views on AGI safety. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Aligning Faithful Interpretations with their Social Attribution. Evaluation. Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction. Comparing Automatic and Human Evaluation of Local Explanations for Text Classification. Do explanations make VQA models more predictable to a human? Sanity Checks for Saliency Maps. A Benchmark for Interpretability Methods in Deep Neural Networks. Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? ERASER: A Benchmark to Evaluate Rationalized NLP Models. On quantitative aspects of model interpretability. Manipulating and M...

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 10, 2021 152:44

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers, published by lifelonglearner, Peter Hase on the AI Alignment Forum. Peter Hase. UNC Chapel Hill. Owen Shen. UC San Diego. With thanks to Robert Kirk and Mohit Bansal for helpful feedback on this post. Introduction. Model interpretability was a bullet point in Concrete Problems in AI Safety (2016). Since then, interpretability has come to comprise entire research directions in technical safety agendas (2020); model transparency appears throughout An overview of 11 proposals for building safe advanced AI (2020); and explainable AI has a Twitter hashtag, #XAI. (For more on how interpretability is relevant to AI safety, see here or here.) Interpretability is now a very popular area of research. The interpretability area was the most popular in terms of video views at ACL last year. Model interpretability is now so mainstream there are books on the topic and corporate services promising it. So what's the state of research on this topic? What does progress in interpretability look like, and are we making progress? What is this post? This post summarizes 70 recent papers on model transparency, interpretability, and explainability, limited to a non-random subset of papers from the past 3 years or so. We also give opinions on several active areas of research, and collate another 90 papers that are not summarized. How to read this post. If you want to see high-level opinions on several areas of interpretability research, just read the opinion section, which is organized according to our very ad-hoc set of topic areas. If you want to learn more about what work looks like in a particular area, you can read the summaries of papers in that area. For a quick glance at each area, we highlight one standout paper per area, so you can just check out that summary. If you want to see more work that has come out in an area, look at the non-summarized papers at the end of the post (organized with the same areas as the summarized papers). We assume readers are familiar with basic aspects of interpretability research, i.e. the kinds of concepts in The Mythos of Model Interpretability and Towards A Rigorous Science of Interpretable Machine Learning. We recommend looking at either of these papers if you want a primer on interpretability. We also assume that readers are familiar with older, foundational works like "Why Should I Trust You?: Explaining the Predictions of Any Classifier." Disclaimer: This post is written by a team of two people, and hence its breadth is limited and its content biased by our interests and backgrounds. A few of the summarized papers are our own. Please let us know if you think we've missed anything important that could improve the post. Master List of Summarized Papers. Theory and Opinion. Explanation in Artificial Intelligence: Insights from the Social Sciences. Chris Olah's views on AGI safety. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Aligning Faithful Interpretations with their Social Attribution. Evaluation. Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction. Comparing Automatic and Human Evaluation of Local Explanations for Text Classification. Do explanations make VQA models more predictable to a human? Sanity Checks for Saliency Maps. A Benchmark for Interpretability Methods in Deep Neural Networks. Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? ERASER: A Benchmark to Evaluate Rationalized NLP Models. On quantitative aspects of model interpretability. Manipulating and Measuring Model Interpretability. Methods. Estimating Feature Importance. Neuron Shapley...

Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

Papers Read on AI

Play Episode Listen Later Nov 23, 2021 34:07

When engineers train deep learning models, they are very much “flying blind”. Commonly used approaches for realtime training diagnostics, such as monitoring the train/test loss, are limited. Assessing a network's training process solely through these performance indicators is akin to debugging software without access to internal states through a debugger. To address this, we present COCKPIT, a collection of instruments that enable a closer look into the inner workings of a learning machine, and a more informative and meaningful status report for practitioners. It facilitates the identification of learning phases and failure modes, like ill chosen hyper parameters. 2021: Frank Schneider, Felix Dangel, Philipp Hennig https://arxiv.org/pdf/2102.06604v2.pdf

training practical tool assessing cockpit debugging deep neural networks

#13 National Influencers, International Norms

Technopolitik

Play Episode Listen Later Nov 17, 2021 23:09

Cyberpolitik #1: France’s Influence Operations Doctrine— Prateek WaghreIn October, France announced a new doctrine for Information Warfare. This development has received surprisingly little attention in English-language discourse over the last three weeks. It was initially reported by Francesco Busseletti, who highlighted: Objective: to counter the growing spread of fake news and disinformation, aimed at weakening the image of Paris and weakening its armed forces, especially abroad such as the Sahel. Considering that its adversaries no longer hesitate to use the weapon of social media against its military operations, France intends to “win the war before the war”. Its strategy boils down to “being on the offensive” …The Defence Minister Florence Parly’s speech also highlighted this aspect of the “war before the war”. Here’s an excerpt from a google-translated version of her speech.“When used wisely, the weapon of information allows you to win without fighting.”What does the doctrine say?The doctrine identifies six characteristics of the “informational layer of cyberspace”:Contraction of space and time.Ability to conceal/falsify origin due to anonymity.Difficulty with erasing information since it can be duplicated, moved, and re-used without the original context.Any individual can produce and broadcast information. (The minister’s speech seems to have gone as far as stating these individual and anonymous actors are at par with media organisations)The point is that social networks have an equalizing power: on Twitter, the voice of an anonymous user counts as much as that of a major media whose essential function is to inform (sic).Continuous innovation such as deepfakes, AI, AR/VR, etc.The presence of operators who impose their own regulations. A challenge for law enforcement as the space is ‘dematerialised.It defines two types of actors that threaten armed forces:Noting that information war is already an everyday reality for the military, it goes on to say that ‘mastery’ in the information field is now a pre-condition for ‘operational superiority’. And that cyberspace offers opportunities to create effects in ‘both information and physical environments’. The document is peppered with many important statements about Lutte Informatique D’influence (L2I).Definition:military operations conducted in the informational layer of cyberspace to detect, characterize and counter attacks, support StratCom, provide information or (perform) deception, independently or in combination with other operations.L2I stands at “the confluence of cyber defence and influence”. And that it requires skills in common with LID (defensive cyber operations) and LIO (offensive cyber operations).L2I offers opportunities for ‘intelligence gathering’ and ‘deception' operations’. (The minister’s speech defined some boundaries explicitly)the French armies will not conduct an information operation (within) the national territory. The French armies … will not destabilize a foreign state through information actions that would target, for example, its electoral processes.As future challenges, the doctrine identifies the need to build skills and tools, as well as cooperation with firms that specialise in the field and coalitions with allies to coordinate responses.Operationally, this would fall within the purview of the Chief of Staff of the armed forces, who would further rely on the Cyber Defense Commander (COMCYBER) and specialised military units.Two more questionsFor France to come out and explicitly state its doctrine is undoubtedly a significant step. But this also raises two broader questions.What should other democracies do?What will DCN operators do?Camille Francois rightly points out that it raises the question of what democracies can/should do in this space and the possibility of gaining a better understanding of techniques used by countries not named - Russia, China, or Iran.Thomas Rid, in his book Active Measures, argues that liberal democracies cannot be good at disinformation. “For liberal democracies in particular, disinformation represents a double threat: being at the receiving end of active measures will undermine democratic institutions—and giving in to the temptation to design and deploy them will have the same result. It is impossible to excel at disinformation and at democracy at the same time. The stronger and the more robust a democratic body politic, the more resistant to disinformation it will be—and the more reluctant to deploy and optimize disinformation. Weakened democracies, in turn, succumb more easily to the temptations of active measures.”Then, there’s the question of Digital Communication Networks which have become the battlefield for such operations. As Lukasz Olenik notes in his overview of the French doctrine, Facebook has taken action against Coordinated Inauthentic Behaviour it identified originating from a network with links to the French Military in December 2020:We found this activity as part of our internal investigation into suspected coordinated inauthentic behavior in Francophone Africa. Although the people behind it attempted to conceal their identities and coordination, our investigation found links to individuals associated with French military.Now that France has explicitly stated its doctrine (and maybe others will follow), will platforms act more aggressively, considering they are already under fire for either enabling or not doing enough to mitigate the fallout from influence operations? Or, will there be wink-wink-nudge-nudge arrangements between them and a particular set of states?Note: Google Translate was used for French to English translations.If you enjoy this newsletter please consider taking our 12 week Graduate Certificate Programmes in Technology & Policy, Public Policy, Defence & Foreign Affairs and Health & Life Sciences. Click here to apply and know more. You can also get a gift coupon worth ₹1000 every time you successfully refer a friend to our programmes.Siliconpolitik #1: AI Chips — Arjun GargeyasWhat are They?One of the emerging applications of semiconductor devices is the concept of Artificial Intelligence (AI) chips. With new and emerging technologies cropping up, there is an increased need for chipsets with increasing computational power and capabilities. Technologies like Machine Learning and Deep Neural Networks, which are part of the AI ecosystem, have a tremendous workload that cannot be fulfilled by traditional chipsets. AI algorithms work on parallel processing or parallelism, which is the ability to multitask and simultaneously run different computational processes. AI chips, in recent years, have tried to incorporate the needs of AI algorithms into chipsets that can be used both in the cloud as well as at network edges (in smartphones, tablets, and other consumer devices).The diverse applications of AI chips have increased its role in the global economy with companies from various industries all looking to maximise the benefits of AI chip technology. Robotics and autonomous driving, for example, need AI algorithms for efficient and effective working, with the computational power of the chipsets needing incredibly fast processing speeds. This has evolved the role of chipsets with AI capabilities from only being used in the cloud or servers to being used in consumer products at the network edges. However, applications such as Biometrics and Image Recognition need AI chips in the cloud or servers for maintaining a large amount of data. The use of AI chips remains integral in data centers which eventually reduces operational costs and improves information management. Why They MatterThe market for AI chips has consistently increased in the last decade with AI chipsets projected to account for 22% of the global AI revenue by 2022. A strong compounded annual growth rate of 54% has been projected for the AI chips market with technologically advanced regions like the Americas and Europe dominating the market in the future. AI chips also rely on a variety of companies, ranging from smartphone manufacturers like Apple, Samsung, and Huawei, to traditional chip designers like Qualcomm and MediaTek, to intellectual property (IP) license providers like ARM. With most of the major semiconductor companies across the world in the business, AI chips look to be the next big thing for the industry. Semiconductor companies have already thrown their hats in the AI ring with the development of advanced AI chips like Graphical Processing Units (GPUs). NVIDIA has a dedicated application programming language called CUDA used in parallel computing on GPUs. Other targeted AI chips like Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs) are developed for specific applications of AI technology. Companies like Microsoft and Google have also invested in the manufacture of these chipsets keeping in mind specific needs such as the speech processing unit of Google Assistant.With increasing global economic revenue and a large market ripe for capture, the presence of China in the AI chips has also been increasing. AI chip funding activity in China has been driven by the hope of creating industry-leading capabilities in machine learning, deep compression, pruning, and system-level optimization for neural networks. Chinese technology companies like Alibaba and Huawei have invested heavily in the manufacture of AI chips for smartphones and other devices. Some Bitcoin mining equipment manufacturers are also getting into the AI optimization game. With domestic AI research in China still playing catch up to the capabilities of Western countries like the United States, these local manufacturing companies have relied on tweaking existing algorithms to create modified AI models. But increased investments along with state support and financing, similar to the semiconductor industry in China, has made AI chips an important technology worth pursuing in technologically adept states. The race for the domination of the global AI chips markets is something to watch out for in the very near future. Cyberpolitik #2: Are Norms Possible?— Sapni G KSince the last edition of this newsletter, much has happened on cyberspace and international action for establishing norms for its operation and regulation. The United States of America joined the Paris Call for Trust and Security in the cyberspace. The 2018 Call led by the French, proposes a multi-stakeholder model for laying down norms for activity in cyberspace during peacetime. This includes, but is not limited to cybersecurity and the concerns of systemic harms to individuals and critical infrastructure. The Call details nine principles that are open for states, local governments, companies, and civil society organizations to support. Protect individuals and infrastructureProtect the internet Defend electoral processes Defend intellectual propertyNon-proliferation Lifecycle security Cyber hygiene No private hack back International norms They incorporate norms of international law, including the ideas put forth by the UDHR, customary international law, and state laws on the governance of information and communication technologies. This operates as a non-binding, non-enforceable set of principles that are to guide the supporters of the Call and their actions. Most major US tech companies including Microsoft, Google, Facebook Inc (now Meta) are already supporters of the Call and have engaged closely with the various associated working groups. However, the US officially supporting the Call signals that it is no longer holding back in international norm-setting in cyberspace. This could also be read as a furtherance of the USA’s reinvigorated interest in cyber norms, both in peacetime and military applications, as evidenced by recent documents such as the 2021 Interim National Security Strategic Guidance and the recent report by the Department of Defense. However, it is noteworthy that the US has not yet made any concrete steps to sign up to the Global Commission on the Stability of Cyberspace, an effort led by research institutes in the Netherlands and Singapore with the support of the French, Dutch, and Singapore governments, which is also engaged in drawing out international norms for cyberspace during peacetime and armed conflict. China, Russia, Israel, and Iran are other major actors in cyberspace that have not supported the call yet. This is indicative of the fissures in international norm-setting on cyberspace, particularly when China is marching ahead creating a regulatory environment that can have ripple effects internationally. India has not officially supported the Call, but several Indian enterprises and the Karnataka Centre of Excellence of Cybersecurity have joined the Call. It is a proposal worth consideration for the Indian government. An early head start can give India a definitive say in the development of doctrines as well as import legislative principles that can be beneficial to the many millions of Indians who go online every day. Siliconpolitik #2: US-China-Chips — It’s Complicated— Pranay KotasthaneThree recent news reports have turned the world's attention back to the links between the US and China in the semiconductor domain. Until now, the commonplace understanding is that the US is focused on constraining China's progress in the semiconductor domain, a weak link in China's otherwise impressive technology stack. These news reports contest this narrative by suggesting the constraints don't seem to be working, as many US investors and firms are still flocking to China.WSJ reports that between 2017 and 2020, many US companies, including Intel, have invested in Chinese design companies. The number of deals (58) has more than doubled when compared to the 2013-2016 period.Bloomberg reports Intel wanted to start a manufacturing plant in Chengdu, but the White House officials discouraged it.These reports come on the heels of another big claim in mid-October, when Alibaba unveiled a 5nm server chip, making many heads turn. This news seemed to indicate that China's pursuit of semiconductor self-sufficiency is bearing fruit despite the geopolitical headwinds.Connecting the DotsIntel seems to be interested in China a lot. While the WSJ report showed that Intel is among the active investors in a Chinese Electronic Design Automation (EDA) firm, the Bloomberg report points out that Intel also wants to build a fab in Chengdu. It’s notable that both these stages of the semiconductor value chain are precisely where the US had planned to restrict Chinese access during the Trump administration. Reportedly, the US NSA Jake Sullivan and a few senators, want to change the investment screening methods to prevent such deals in the future.Why are US companies still rushing to China?The supply side: The Chinese government's incentives are 'crowding in' investments from Chinese firms and global semiconductor players alike.The demand side: A significant number of customers of chip makers are based in China - laptop manufacturers, phone manufacturers, servers etc. Companies still want a piece of that pie because homegrown alternatives in China are not enough, yet. It's a mouth-watering market, still.My initial assessmentThe number of investment deals between 2017-20 (58) doesn't sound that big in the overall scheme of things. They also mostly appear to be in chip design firms. What this does suggest is like many industrial policies, there is a crowding-in of capital. When a player the size of the Chinese government throws big money at a problem (starting the Chip Fund in 2014), this is expected to happen. There will be national champions and duds, both. The question really is, how long such subsidies can be sustained.The time period 2017-2020 suggests that the US companies rushed into China before the Trump administration tightened the export controls.Intel's investment in a Chinese EDA firm and a possible fab is indeed worrying. Although, the tone of the Chengdu fab proposal suggests it is more a tactic to get the CHIPS Act passed in the US, which will guarantee big subsidies for the likes of Intel back home. The report had no numbers, or plans, just a few unnamed sources.The demand side question is an important one. As long as China remains the hub for electronics Original Equipment Manufacturers (OEMs), chip makers will find it attractive to sell their products to China. Solving this will require a plurilateral effort to move electronics manufacturing -- and not leading-edge chip manufacturing alone -- out of China.Finally, the Alibaba server chip news report has many unknowns. Unveiling a chip is different from being able to produce it. Manufacturing at 5nm is not possible in China. They must rely on TSMC (and now Samsung) for this purpose. Moreover, the processor IP is still ARM, something that Alibaba hasn't been able to displace.The reportage of the kind WSJ, Bloomberg is putting out is indicative of the change in mindset in the US. A few years ago, no one would have even cared about such investments. These are front-page news items now.I expect some more export controls and more subsidies from the US government, both.Our Reading Menu1. [Full Text] of the Paris Call for Trust and Security in the cyberspace2. [Full Text] 2021 Interim National Security Strategic Guidance of the White house 3. [Policy Study] Principles for Content Policy and Governance by Chris Riley, R Street 4. [Article] by Oleg Shakirov discussing the US-Russia rapprochement on Information and Cyber Security5. [Blog] by Oleg Shakirov explaining why US-Russia cooperation on countering Ransomware threats makes sense This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hightechir.substack.com

united states america trust ai donald trump europe english google israel china apple technology france french russia chinese international microsoft western influencers chief blog white house indian connecting security defense iran protect artificial intelligence companies excellence wall street journal netherlands singapore dutch definition americas cybersecurity ip intel bloomberg unveiling samsung ability cyber complicated manufacturing governance machine learning arm public policy indians robotics defend nvidia stability difficulty continuous huawei alibaba objective ransomware lifecycle norms qualcomm reportedly semiconductors ar vr biometrics sahel gpus google assistant cyberspace tsmc lid noting contraction chengdu us russia chips act technology policy weakened lio cuda fulltext information warfare dcn mediatek chris riley operationally active measures global commission r street image recognition deep neural networks francophone africa stratcom thomas rid udhr

Erik Hoel: Dreams, Sleep, Deep Learning, Evolutionary Function of Fiction & Art | #43

Mind & Matter

Play Episode Play 57 sec Highlight Listen Later Nov 4, 2021 87:16 Transcription Available

Nick talks to neuroscientist and writer Dr. Erik Hoel. Erik is a professor of biology at Tufts University. He received his PhD in neuroscience from the University of Wisconsin, where he studied under the sleep and consciousness researcher, Giulio Tononi. He did postdoctoral work at Columbia University, where he used information theory and other analytical tools to explore the biological basis of consciousness. He has come up with the so-called Overfitted Brain Hypothesis of dreaming, which explains the potential adaptive function of dreams by drawing analogies to techniques used to train Deep Neural Networks in the world of machine learning. Erik and Nick discuss the biology and phenomenology of dreams and sleep generally, including some of the various theories for why we sleep. They also discuss Deep Learning (on a very basic level) and Erik described the Overfitted Brain Hypothesis of dreaming. They also discuss fiction and the arts, including Erik's new novel and the potential evolutionary reasons for why humans create and consume fiction, as well as some technology-driven developments that are reshaping how we create and consume written work online. USEFUL LINKS:Download the podcast & follow Nick at his website[https://www.nickjikomes.com]Support the show on Patreon & get early access to episodes[https://www.patreon.com/nickjikomes]Sign up for the weekly Mind & Matter newsletter[https://mindandmatter.substack.com/]Athletic Greens, comprehensive daily nutrition (Free 1-year supply of Vitamin D w/ purchase)[https://www.athleticgreens.com/mindandmatter]Try MUD/WTR, a mushroom-based coffee alternative[https://www.mudwtr.com/mindmatter]Discount Code ($5 off) = MINDMATTEROrganize your digital highlights & notes w/ Readwise (2 months free w/ subscription)[https://readwise.io/nickjikomes/]Start your own podcast (get $20 Amazon gift card after signup)[https://www.buzzsprout.com/?referrer_id=1507198]Buy Mind & Matter T-Shirts[https://www.etsy.com/shop/OURMIND?ref=simple-shop-header-name&listing_id=1036758072§ion_id=34648633]Connect with Nick Jikomes on Twitter[https://twitter.com/trikomes]Learn more about our podcast sponsor, Dosist[https://dosist.com/]ABOUT Nick Jikomes:Nick is a neuroscientist and podcast host. He is currently Director of Science & Innovation at Leafly, a technology startup in the legal cannabis industry. He received a Ph.D. in Neuroscience from Harvard University and a B.S. in Genetics from the University of Wisconsin-Madison.Support the show (https://www.patreon.com/nickjikomes)

Biological Intelligence and the Limitations of Deep Neural Networks – Intel on AI Season 3, Episode 2

Intel on AI

Play Episode Listen Later Nov 3, 2021 37:45

In this episode of Intel on AI host Amir Khosrowshahi and Melanie Mitchell talk about the paradox of studying human intelligence and the limitations of deep neural networks. Melanie is the Davis Professor of Complexity at the Santa Fe Institute, former professor of Computer Science at Portland State University, and the author/editor of six books and numerous scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems, including Complexity: A Guided Tour and Artificial Intelligence: A Guide for Thinking Humans. In the episode, Melanie and Amir discuss how intelligence emerges from the substrate of neurons and why being able to perceive abstract similarities between different situations via analogies is at the core of cognition. Melanie goes into detail about deep neural networks using spurious statistical correlations, the distinction between generative and discriminative systems and machine learning, and the theory that a fundamental part of the human brain is trying to predict what is going to happen next based on prior experience. She also talks about creating the Copycat software, the dangers of artificial intelligence (AI) being easy to manipulate even in very narrow areas, and the importance of getting inspiration from biological intelligence. Academic research discussed in the podcast episode: Gödel, Escher, Bach: an Eternal Golden Braid Fluid Concepts and Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought A computational model for solving problems from the Raven's Progressive Matrices intelligence test using iconic visual representations A Framework for Representing Knowledge On the Measure of Intelligence The Abstraction and Reasoning Corpus (ARC) Human-level concept learning through probabilistic program induction Why AI is Harder Than We Think We Shouldn't be Scared by ‘Superintelligent A.I.' (New York Times opinion piece)

#2: Deep Learning based Recommender Systems with Even Oldridge

Recsperts - Recommender Systems Experts

Play Episode Listen Later Oct 31, 2021 50:07

In episode two I am joined by Even Oldridge, Senior Manager at NVIDIA, who is leading the Merlin Team. These people are working on an open-source framework for building large-scale deep learning recommender systems and have already won numerous RecSys competitions.We talk about the relevance and impact of deep learning applied to recommender systems as well as the challenges and pitfalls of deep learning based recommender systems. We briefly touch on Even's early data science contributions at PlentyOfFish, a Canadian online-dating platform. Starting with personalized recommendations of people to people he transitioned to realtor, a real-estate marketplace. From the potentially biggest social decision in life to the probably biggest financial decision in life he has really been involved with recommender systems at the extremes. At NVIDIA - to which he refers as the one company that works with all the other AI companies - he pushes for Merlin as large-scale, accessible and efficient platform for developing and deploying recommender systems on GPUs.This brought him also closer to the community which he served as industry Co-Chair at RecSys in 2021 as well as to winning multiple RecSys competitions with his team in the recent years.Enjoy this enriching episode of RECSPERTS - Recommender Systems Experts.Links from this Episode: Even Oldridge on LinkedIn and Twitter NVIDIA Merlin NVIDIA Merlin at GitHub Even's upcoming Talk at GTC 2021: Building and Deploying Recommender Systems Quickly and Easily with NVIDIA Merlin PlentyOfFish, realtor fast.ai Twitter RecSys Challenge 2021 Recommending music on Spotify with Deep Learning Papers Dacrema et al. (2019): Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches (best paper award at RecSys 2019) Jannach et al. (2020): Why Are Deep Learning Models Not Consistently Winning Recommender Systems Competitions Yet?: A Position Paper Moreira et al. (2021): Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation Deotte et al. (2021): GPU Accelerated Boosted Trees and Deep Neural Networks for Better Recommender Systems General Links: Follow me on Twitter: https://twitter.com/LivesInAnalogia Send me your comments, questions and suggestions to marcel@recsperts.com Podcast Website: https://www.recsperts.com/

CNNs & ViTs (Vision Transfomers) - Comparing the internal structures, Maithra Raghu, Google

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Oct 29, 2021 5:21

Do Vision Transformers work in the same way as CNNs? Do the internal representational structures of ViTs and CNNs differ? An in-depth analysis article: https://arxiv.org/pdf/2108.08810.pdfListen to the full conversation here: https://youtu.be/htnJxcwJqeADr. Maithra Raghu is a senior research scientist at Google working on analyzing the internal workings of deep neural networks so that we can deploy them better keeping humans in the loop. She recently graduated from Cornell University with a Ph.D. in CS and previously graduated from Cambridge University with BA and Masters in Mathematics. She has received multiple awards for her research work including the Forbes 30 under 30.Maithra's Homepage: https://maithraraghu.comAbout the Host:Jay is a Ph.D. student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!#explainableai #reliableai #robustai #machinelearning***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Learning to Ponder: Memory in Deep Neural Networks with Andrea Banino - #528

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Oct 18, 2021 37:12

Today we're joined by Andrea Banino, a research scientist at DeepMind. In our conversation with Andrea, we explore his interest in artificial general intelligence by way of episodic memory, the relationship between memory and intelligence, the challenges of applying memory in the context of neural networks, and how to overcome problems of generalization. We also discuss his work on the PonderNet, a neural network that “budgets” its computational investment in solving a problem, according to the inherent complexity of the problem, the impetus and goals of this research, and how PonderNet connects to his memory research. The complete show notes for this episode can be found at twimlai.com/go/528.

learning memory ponder deepmind deep neural networks

Learning the internals of Machine Learning systems and tips for PhD | Maithra Raghu, Google Brain

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Sep 30, 2021 75:27

Dr. Maithra Raghu is a senior research scientist at Google working on analyzing the internal workings of deep neural networks so that we can deploy them better keeping humans in the loop. She recently graduated from Cornell University with a PhD in CS and previously graduated from Cambridge University with BA and Masters in Mathematics. She has received multiple awards for her research work including the Forbes 30 under 30.Questions that we cover00:00:00 Introductions00:01:00 To understand more about your research interests, can you tell us what kind of research questions you are interested in while working at Google Brain?00:04:45 What interested you about it and how did you get started?00:15:00 What is one thing that surprises/puzzles you about deep learning effectiveness to date?00:22:05 What's the difference between being a researcher in academia/PhD student vs being a researcher at a big organization (Google)?00:28:35 In what use cases do you think ViTs might be a good choice to perform image analysis over CNN vs where do you think CNNs still have an undoubted advantage?00:37:15 Why does ViT perform better than ResNet only on larger datasets and not on mid-sized datasets or smaller? 00:43:55 In regards to medical imaging tasks, would it be theoretically wrong to pre-train the model on dataset A and fine-tune it on dataset B?00:47:35 Do you think ViT or transformer-based models already have/have the potential to cause a paradigm shift in the way we approach imaging tasks? Why?00:5:25 Medical datasets are often limited in size, what are your views on tackling these problems in the near future00:55:55 From an internal representation perspective, do you think deep neural networks can have the ability of reasoning?00:58:20 How did you decide on your own PhD research topic? Advice you would give to graduate researchers trying to find a research problem for their thesis?01:04:00 Many times researchers/students feel stuck/overwhelmed with a particular project they are working on, how do you suggest based on experience to tackling that?01:10:35 How do you now/as a graduate student used to keep up with the latest research in ML/DL?Maithra's Homepage: https://maithraraghu.comBlogpost talked about: https://maithraraghu.com/blog/2020/Reflections_on_my_Machine_Learning_PhD_Journey/Her Twitter: https://twitter.com/maithra_raghuAbout the Host:Jay is a PhD student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!#explainableai #reliableai #robustai #machinelearning***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Accelerating AI Innovation by Optimizing Infrastructure. With Dr. Muthian Sivathanu

Microsoft Research India Podcast

Play Episode Listen Later Sep 29, 2021 27:31

Episode 010 | September 28, 2021Artificial intelligence, Machine Learning, Deep Learning, and Deep Neural Networks are today critical to the success of many industries. But they are also extremely compute intensive and expensive to run in terms of both time and cost, and resource constraints can even slow down the pace of innovation. Join us as we speak to Muthian Sivathanu, Partner Research Manager at Microsoft Research India, about the work he and his colleagues are doing to enable optimal utilization of existing infrastructure to significantly reduce the cost of AI.Muthian's interests lie broadly in the space of large-scale distributed systems, storage, and systems for deep learning, blockchains, and information retrieval.Prior to joining Microsoft Research, he worked at Google for about 10 years, with a large part of the work focused on building key infrastructure powering Google web search — in particular, the query engine for web search. Muthian obtained his Ph.D from University of Wisconsin Madison in 2005 in the area of file and storage systems, and a B.E. from CEG, Anna University, in 2000.For more information about the Microsoft Research India click here.RelatedMicrosoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmail TranscriptMuthian Sivathanu: Continued innovation in systems and efficiency and costs are going to be crucial to drive the next generation of AI advances, right. And the last 10 years have been huge for deep learning and AI and primary reason for that has been the significant advance in both hardware in terms of emergence of GPUs and so on, as well as software infrastructure to actually parallelize jobs, run large distributed jobs efficiently and so on. And if you think about the theory of deep learning, people knew about backpropagation about neural networks 25 years ago. And we largely use very similar techniques today. But why have they really taken off in the last 10 years? The main catalyst has been sort of advancement in systems. And if you look at the trajectory of current deep learning models, the rate at which they are growing larger and larger, systems innovation will continue to be the bottleneck in sort of determining the next generation of advancement in AI.[Music]Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that's impacting technology and society. I'm your host, Sridhar Vedantham.[Music]Sridhar Vedantham: Artificial intelligence, Machine Learning, Deep Learning, and Deep Neural Networks are today critical to the success of many industries. But they are also extremely compute intensive and expensive to run in terms of both time and cost, and resource constraints can even slow down the pace of innovation. Join us as we speak to Muthian Sivathanu, Partner Research Manager at Microsoft Research India, about the work he and his colleagues are doing to enable optimal utilization of existing infrastructure to significantly reduce the cost of AI.[Music]Sridhar Vedantham: So Muthian, welcome to the podcast and thanks for making the time for this.Muthian Sivathanu: Thanks Sridhar, pleasure to be here.Sridhar Vedantham: And what I'm really looking forward to, given that we seem to be in some kind of final stages of the pandemic, is to actually be able to meet you face to face again after a long time. Unfortunately, we've had to again do a remote podcast which isn't all that much fun.Muthian Sivathanu: Right, right. Yeah, I'm looking forward to the time when we can actually do this again in office.Sridhar Vedantham: Yeah. Ok, so let me jump right into this. You know we keep hearing about things like AI and deep learning and deep neural networks and so on and so forth. What's very interesting in all of this is that we kind of tend to hear about the end product of all this, which is kind of, you know, what actually impacts businesses, what impacts consumers, what impacts the health care industry, for example, right, in terms of AI. It's a little bit of a mystery, I think to a lot of people as to how all this works, because... what goes on behind the scenes to actually make AI work is generally not talked about. Muthian Sivathanu: Yeah.Sridhar Vedantham: So, before we get into the meat of the podcast you just want to speak a little bit about what goes on in the background.Muthian Sivathanu: Sure. So, machine learning, Sridhar, as you know, and deep learning in particular, is essentially about learning patterns from data, right, and deep learning system is fed a lot of training examples, examples of input and output, and then it automatically learns a model that fits that data, right. And this is typically called the training phase. So, training phase is where it takes data builds a model how to fit. Now what is interesting is, once this model is built, which was really meant to fit the training data, the model is really good at answering queries on data that it had never seen before, and this is where it becomes useful. These models are built in various domains. It could be for recognizing an image for converting speech to text, and so on, right. And what has in particular happened over the last 10 or so years is that there has been significant advancement both on the theory side of machine learning, which is, new algorithms, new model structures that do a better job at fitting the input data to a generalizable model as well as rapid innovation in systems infrastructure which actually enable the model to sort of do its work, which is very compute intensive, in a way that's actually scalable that's actually feasible economically, cost effective and so on.Sridhar Vedantham: OK, Muthian, so it sounds like there's a lot of compute actually required to make things like AI and ML happen. Can you give me a sense of what kind of resources or how intensive the resource requirement is?Muthian Sivathanu: Yeah. So the resource usage in a machine learning model is a direct function of how many parameters it has, so the more complex the data set, the larger the model gets, and correspondingly requires more compute resources, right. To give you an idea, the early machine learning models which perform simple tasks like recognizing digits and so on, they could run on a single server machine in a few hours, but models now, just over the last two years, for example, the size of the largest model that's useful that state of the art, that achieves state of the art accuracy has grown by nearly three orders of magnitude, right. And what that means is today to train these models you need thousands and thousands of servers and that's infeasible. Also, accelerators or GPUs have really taken over the last 6-7 years and GPUs. A single V-100 GPU today, a Volta GPU from NVIDIA can run about 140 trillion operations per second. And you need several hundreds of them to actually train a model like this. And they run for months together to train a 175 billion model, which is called GPT 3 recently, you need on the order of thousands of such GPUs and it still takes a month.Sridhar Vedantham: A month, that's sounds like a humongous amount of time. Muthian Sivathanu: Exactly, right? So that's why I think just as I told you how the advance in the theory of machine learning in terms of new algorithms, new model structures, and so on have been crucial to the recent advance in the relevance in practical utility of deep learning.Equally important has been this advancement in systems, right, because given this huge explosion of compute demands that these workloads place, we need fundamental innovation in systems to actually keep pace, to actually make sure that you can train them in reasonable time, you can actually do that with reasonable cost.Sridhar Vedantham: Right. Ok, so you know for a long time, I was generally under the impression that if you wanted to run bigger and bigger models and bigger jobs, essentially you had to throw more hardware at it because at one point hardware was cheap. But I guess that kind of applies only to the CPU kind of scenario, whereas the GPU scenario tends to become really expensive, right?Muthian Sivathanu: Yep, yeah.Sridhar Vedantham: Ok, so in which case, when there is basically some kind of a limit being imposed because of the cost of GPUs, how does one actually go about tackling this problem of scale?Muthian Sivathanu: Yeah, so the high-level problem ends up being, you have limited resources, so let's say you can view this in two perspectives, right. One is from the perspective of a machine learning developer or a machine learning researcher, who wants to build a model to accomplish a particular task right. So, from the perspective of the user, there are two things you need. A, you want to iterate really fast, right, because deep learning, incidentally, is this special category of machine learning, where the exploration is largely by trial and error. So, if you want to know which model actually works which parameters, or which hyperparameter set actually gives you the best accuracy, the only way to really know for sure is to train the model to completion, measure accuracy, and then you would know which model is better, right. So, as you can see, the iteration time, the time to train a model to run inference on it directly impacts the rate of progress you can achieve. The second aspect that the machine learning researcher cares about is cost. You want to do it without spending a lot of dollar cost.Sridhar Vedantham: Right.Muthian Sivathanu: Now from the perspective of let's say a cloud provider who runs this, huge farm of GPUs and then offers this as a service for researchers, for users to run machine learning models, their objective function is cost, right. So, to support a given workload you need to support it with as minimal GPUs as possible. Or in other words, if you have a certain amount of GPU capacity, you want to maximize the utilization, the throughput you can get out of those GPUs, and that's where a lot of the work we've been doing at MSR has focused on. How do you sort of multiplex lots and lots of jobs onto a finite set of GPUs, while maximizing the throughput that you can get from them?Sridhar Vedantham: Right, so I know you and your team have been working on this problem for a while now. Do you want to share with us some of the key insights and some of the results that you've achieved so far, because it is interesting, right? Schedulers have been around for a while. It's not that there aren't schedulers, but essentially what you're saying is that the schedulers that exist do not really cut it, given the, intensity of the compute requirements as well as the jobs, as the size of the jobs and models that are being run today in terms of deep learning or even machine learning models, right?Muthian Sivathanu: That's right.Sridhar Vedantham: So, what are your, key insights and what are some of the results that you guys have achieved?Muthian Sivathanu: So, you raise a good point. I mean, schedulers for distributed systems have been around for decades, right. But what makes deep learning somewhat special is that it turns out, in contrast to traditional schedulers, which have to view a job as a black box, because they're meant to run arbitrary jobs. There is a limit to how efficient they can be. Whereas in deep learning, first of all because deep learning is such high impact area with lots, and I mean from an economic perspective, there are billions of dollars spent in these GPUs and so on. So, there is enough economic incentive to extract the last bit of performance out of these expensive GPUs, right. And that lends itself into this realm of- what if we co-design? What if we custom design a scheduler for the specific case of deep learning, right. And that's what we did in the Gandiva project which we published at OSDI in 2018. What we said was, instead of viewing a deep learning job as just another distributed job which is opaque to us, let's actually exploit some key characteristics that are unique to deep learning jobs, right? And one of those characteristics, is that although, as I said, a single deep learning training job can run for days or even months, right, deep within it is actually composed of millions and millions of these what are called mini batches. So, what is a mini batch? A mini batch is an iteration in the training where it reads one set of input training examples, runs it through the model, and then back propagates the loss, and essentially, changes the parameters to fit that input. And this sequence this mini batch repeats over and over again across millions and millions of mini batches. And what makes it particularly interesting and relevant from a systems optimization viewpoint is that from a resource usage perspective and from a performance perspective, mini batches are identical. They may be operating on different data in each mini batch, but the computation they do is pretty much identical. And what that means is we can look at the job for a few mini batches and we can know what exactly is going to do for the rest of its life time, right. And that allows us to, for example, do things like, we can automatically decide which hardware generation is the best fit for this job, because you can just measure it in a whole bunch of hardware configurations. Or when you're distributing the job, you can compare it across a whole bunch of parallelism configurations, and you can automatically figure out, this is the right configuration, right hardware assignment for this particular job, which you couldn't do in an arbitrary job with a distributed scheduler because the job could be doing different things at different times. Like a MapReduce job for example, it would keep fluctuating across how we'd use a CPU, network, storage, and so on, right. Whereas with deep learning there is this remarkable repeatability and predictability, right. What it also allows us to do is, we can then look within a mini batch what happens, and it turns out, one of the things that happens is, if you look at the memory usage, how much GPU memory the training loop itself is consuming, somewhere at the middle of a mini batch, the memory peaks to almost fill the entire GPU memory, right. And then by the time the mini batch ends, the memory usage drops down by like a factor of anywhere between 10 to 50x. Right, and so there is this sawtooth pattern in the memory usage, and so one of the things we did in Gandiva was proposed this mechanism of transparently migrating a job, so you should be able to, on demand checkpoint a job. The scheduler should be able to do it and just move it to a different machine, maybe even essentially different GPU, different machine, and so on, right. And this is very powerful from load balancing. Lots of scheduling things become easy if you do this. Now, when you're doing that, when you are actually moving a job from one machine to another, it helps if the amount of state you need to move is small, right. And so that's where this awareness of mini batch boundaries and so on helps us, because now you can choose when exactly to move it so that you move 50x, smaller amount of state.Sridhar Vedantham: Right. Very interesting, and another part of this whole thing about resources and compute and all that is, I think, the demands on storage itself, right?Muthian Sivathanu: Yeah.Sridhar Vedantham: Because if the models are that big, that you need some really high-powered GPUs to compute, how do you manage the storage requirements?Muthian Sivathanu: Right, right. So, it turns out the biggest requirement from storage that deep learning poses is on the throughput that you need from storage, right. So, as I mentioned, because GPUs are the most expensive resource in this whole infrastructure stack, the single most important objective is to keep GPUs busy all the time, right. You don't want them idling, at all. What that means is the input training data that the model needs in order to run its mini batches, that is to be fed to it at a rate that is sufficient to keep the GPUs busy. And GPUs process, I mean the amount of data that the GPU can process from a compute perspective has been growing at a very rapid pace, right. And so, what that means is, you know, when between Volta series and an Ampere series, for example, of GPUs there is like 3X improvement in compute speed, right. Now that means the storage bandwidth should keep up with that pace, otherwise faster GPU doesn't help. It will be stalling on IO. So, in that context one of the systems we built was the system called Quiver, where we say a traditional remote storage system like the standard model for running this training is...the datasets are large- I mean the data sets can be in terabytes, so, you place it on some remote cloud storage system, like Azure blob or something like that, and you read it remotely from whichever machine does the training, right. And that bandwidth simply doesn't cut it because it goes through network backbone switches and so on, and it becomes insanely expensive to sustain that level of bandwidth from a traditional cloud storage system, right. So what we need, to achieve here is hyper locality. So, ideally the data should reside on the exact machine that runs the training, then it's a local read and it has to reside on SSD and so on, right. So, you need several gigabytes per second read bandwidth.Sridhar Vedantham: And this is to reduce network latency?Muthian Sivathanu: Yes, this is to reduce network latency and congestion, like when it goes through lots of back end, like T1 switches, T2 switches etc. The end-to-end throughput that you get across the network is not as much as what you can get locally, right?Sridhar Vedantham: Right.Muthian Sivathanu: So, ideally you want to keep the data local in the same machine, but as I said, for some of these models, the data set can be in tens of terabytes. So, what we really need is a distributed cache, so to speak, right, but a cache that is locality aware. So, what we have is a mechanism by which, within each locality domain like a rack for example, we have a copy of the entire training data, so, a rack could comprise maybe 20 or 30 machines, so across them you can still fit the training data and then you do peer to peer across machines in the rack for the access to the cache. And within a rack, network bandwidth is not a limitation. You can get nearly the same performance as you could from local SSD, so that's what we did in Quiver and there are a bunch of challenges here, because if every model wants the entire training data to be local to be within the rack, then there is just no cache space for keeping all of that.Sridhar Vedantham: Right.Muthian Sivathanu: Right. So we have this mechanism by which we can transparently share the cache across multiple jobs, or even multiple users without compromising security, right. And we do that by sort of intelligent content addressing of the cache entries so that even though two users may be accessing different copies of the same data internally in the cache, they will refer to the same instance.Sridhar Vedantham: Right, I was actually just going to ask you that question about how do you maintain security of data, given that you're talking about distributed caching, right? Because it's very possible that multiuser jobs will be running simultaneously, but that's good, you answered it yourself. So, you know I've heard you speak a lot about things like micro design and so on. How do you bring those principles to bear in these kind of projects here?Muthian Sivathanu: Right, right. So, I alluded to this a little bit in one of my earlier points, which is the interface, I mean, if you look at a traditional scheduler which we use the job as a black box, right. That is an example of traditional philosophy to system design, where you build each layer independent of the layer above or below it, right, so that, there are good reasons to do it because you know, like multiple use cases can use the same underlying infrastructure, like if you look at an operating system, it's built to run any process, whether it is Office or a browser or whatever, right.Sridhar Vedantham: Right.Muthian Sivathanu: But, in workloads like deep learning, which place particularly high demands on compute and that are super expensive and so on, there is benefit to sort of relaxing this tight layering to some extent, right. So that's the philosophy we take in Gandiva, for example, where we say the scheduler no longer needs to think of it as a black box, it can make use of internal knowledge. It can know what mini batch boundaries are. It can know that mini batch times are repeatable and stuff like that, right. So, co-design is a philosophy that has been gaining traction over the last several years, and people typically refer to hardware, software co-design for example. What we do in micro co-design is sort of take a more pragmatic view to co-design where we say look, it's not always possible to rebuild entire software layers from scratch to make them more tightly coupled, but the reality is in existing large systems we have these software stacks, infrastructure stacks, and what can we do without rocking the ship, without essentially throwing away everything in building everything from a clean slate. So, what we do is very surgical, carefully thought through interface changes, that allow us to expose more information from one layer to another, and then we also introduce some control points which allow one layer to control. For example, the scheduler can have a control point to ask a job to suspend. And it turns out by opening up those carefully thought through interface points, you leave the bulk of the infrastructure unchanged, but yet achieve these efficiencies that result from richer information and richer control, right. So, micro co-design is something we have been adopting, not only in Gandiva and Quiver, but in several other projects in MSR. And MICRO stands for Minimally Invasive Cheap and Retrofittable Co-design. So, it's a more pragmatic view to co-design in the context of large cloud infrastructures.Sridhar Vedantham: Right, where you can do the co-design with the minimum disruption to the existing systems.Muthian Sivathanu: That's right. Sridhar Vedantham: Excellent. [Music]Sridhar Vedantham: We have spoken a lot about the work that you've been doing and it's quite impressive. Do you have some numbers in terms of you know, how jobs will run faster or savings of any nature, do you have any numbers that you can share with us? Muthian Sivathanu: Yeah, sure. So the numbers, as always depend on the workload and several aspects. But I can give you some examples. So, in the Gandiva work that we did. We, introduce this ability to time slice jobs, right. So, the idea is, today when you launch a job in a GPU machine, that job essentially holds on to that machine until it completes, and until that time it has exclusive possession of that GPU, no other job can use it, right. And this is not ideal in several scenarios. You know, one classic example is hyperparameter tuning, where you have a model and you need to decide what exact hyperparameter values like learning rate, etc. actually are the best fit and give the best accuracy for this model. So, people typically do what is called the hyperparameter search where you run maybe 100 instances of the model, see how it's doing, maybe kill some instances spawn of new instances, and so on, right. And hyperparameter exploration really benefits from parallelism. You want to run all these instances at the same time so that you have an apples-to-apples comparison of how they are doing. And if you want to run like 100 configurations and you have only 10 GPUs, that significantly slows down hyperparameter exploration- it serializes it, right. What Gandiva has is an ability to perform fine grained time slicing of the same GPU across multiple jobs, just like how an operating system time slices multiple processes, multiple programs on the same CPU, we do the same in GPU context, right. And because we make use of mini batch boundaries and so on, we can do this very efficiently. And with that we showed that for typical hyperparameter tuning, we can sort of speed up the end-to-end time to accuracy by nearly 5-6x, right. Uh, and so this is one example of how time slicing can help. We also saw that from a cluster wide utilization perspective, some of the techniques that Gandiva adopted can improve overall cluster utilization by 20-30%. Right, and this directly translates to cost incurred to the cloud provider running those GPS because it means with the same GPU capacity, I can serve 30% more workload or vice versa, right, for a given workload I only need 30% lesser number of GPUs.Sridhar Vedantham: Yeah, I mean those savings sound huge and I think you're also therefore talking about reducing the cost of AI making the process of AI itself more efficient. Muthian Sivathanu: That's correct, that's correct. So, the more we are able to extract performance out of the same infrastructure, the cost per model or the cost per user goes down and so the cost of AI reduces and for large companies like Microsoft or Google, which have first party products that require deep learning, like search and office and so on, it reduces the capital expenditure running such clusters to support those workloads.Sridhar VedanthamRight.Muthian Sivathanu: And we've also been thinking about areas such as, today there is this limitation that large models need to run in really tightly coupled hyperclusters which are connected via InfiniBand and so on. And that brings up another dimension of cost escalation to the equation, because these are sparse, the networking itself is expensive, there is fragmentation across hyperclusters and so on. What we showed in some recent work is how can you actually run training of large models in just commodity VMs-these are just commodity GPU VMs- but without any requirement on them being part of the same InfiniBand cluster or hypercluster, but just they can be scattered anywhere in the data center, and more interestingly, we can actually run these off of spot VMs. So Azure, AWS, all cloud providers provide these bursty VMs or low priority VMs, which is away essentially for them to sell spare capacity, right. So, you get them at a significant discount. Maybe 5-10x cheaper price. And the disadvantage, I mean the downside of that is they can go away at any time. They can be preempted when real demand shows up. So, what we showed is it's possible to train such massive models at the same performance, despite these being on spot VMs and spread over a commodity network without custom InfiniBand and so on. So that's another example how you can bring down the cost of AI by reducing constraints on what hardware you need.Sridhar Vedantham: Muthian, we're kind of reaching the end of the podcast, and is there anything that you want to leave the listeners with, based on your insights and learning from the work that you've been doing? Muthian Sivathanu: Yeah, so taking a step back, right? I think continued innovation in systems and efficiency and costs are going to be crucial to drive the next generation of AI advances, right. And the last 10 years have been huge for deep learning and AI and primary reason for that has been the significant advance in both hardware in terms of emergence of GPUs and so on, as well as software infrastructure to actually parallelize jobs, run large distributed jobs efficiently and so on. And if you think about the theory of deep learning, people knew about backpropagation about neural networks 25 years ago. And we largely use very similar techniques today. But why have they really taken off in the last 10 years? The main catalyst has been sort of advancement in systems. And if you look at the trajectory of current deep learning models, the rate at which they are growing larger and larger, systems innovation will continue to be the bottleneck in sort of determining the next generation of advancement in AI.Sridhar Vedantham: Ok Muthian, I know that we're kind of running out of time now but thank you so much. This has been a fascinating conversation.Muthian Sivathanu: Thanks Sridhar, it was a pleasure.Sridhar Vedantham: Thank you

ACM RecSys Winning Solution: Benedikt Schifferer, Bo Liu, Chris Deotte, Even Oldridge #136

Chai Time Data Science

Play Episode Listen Later Jul 31, 2021 83:08

Video Version: https://youtu.be/W3aWEXqIkWk Blog Overview: http://sanyambhutani.com/interview-with-the-nvidia-acm-recsys-2021-winning-team Subscribe here to the newsletter: https://tinyletter.com/sanyambhutani In this Episode, Sanyam Bhutani interviews a panel from the ACM RecSys Winning competition team at NVIDIA. They explain why are RecSys systems such a hard problem, how can GPUs accelerate these, how do we productize such solutions. The team also does a ground basic to a complete overview of their solution. They understand the team's approaches to the problem, how did they arrive at the solution, and the tricks that they discovered and very generously shared in this interview Links: Interview with Even Oldridge: https://youtu.be/-WzXIV8P_Jk Interview with Chris Deotte: https://youtu.be/QGCvycOXs2M Open Source Solution: https://github.com/NVIDIA-Merlin/competitions/tree/main/RecSys2021_Challenge Paper Link: https://github.com/NVIDIA-Merlin/competitions/blob/main/RecSys2021_Challenge/GPU-Accelerated-Boosted-Trees-and-Deep-Neural-Networks-for-Better-Recommender-Systems.pdf Follow: Benedikt Schifferer: Linkedin: https://www.linkedin.com/in/benedikt-schifferer/ Bo Liu: Twitter: https://twitter.com/boliu0 Kaggle: https://www.kaggle.com/boliu0 Chris Deotte: Twitter: https://twitter.com/ChrisDeotte Kaggle: https://www.kaggle.com/cdeotte Even Oldridge Twitter: https://twitter.com/even_oldridge Linkedin: https://www.linkedin.com/in/even-oldridge/ Sanyam Bhutani: https://twitter.com/bhutanisanyam1 Blog: sanyambhutani.com About: https://sanyambhutani.com/tag/chaitimedatascience/ A show for Interviews with Practitioners, Kagglers & Researchers, and all things Data Science hosted by Sanyam Bhutani.

interview winning solution researchers nvidia practitioners data science gpus benedikt deep neural networks sanyam bhutani kagglers

The Benefits of Deep Neural Networks in Hearing Aids feat. Dr. Douglas L. Beck | Oticon

Hearing Matters Podcast

Play Episode Listen Later Jul 13, 2021 17:14 Transcription Available

About the Hearing Matters Podcast The Hearing Matters Podcast discusses hearing technology (more commonly known as hearing aids), best practices, and a growing national epidemic - Hearing Loss. The show is hosted by father and son - Blaise Delfino, M.S., HIS, and Dr. Gregory Delfino, CCC-A. Blaise Delfino and Dr. Gregory Delfino treat patients with hearing loss at Audiology Services, located in Bethlehem, Nazareth, and East Stroudsburg, PA. The Benefits of Deep Neural Networks in Hearing AidsIn this episode, Blaise Delfino discusses the deep neural network in the new Oticon More hearing aid with Dr. Douglas L. Beck, vice president of academic sciences at Oticon. Dr. Beck explains that in Oticon's newest hearing aid, the Oticon More, there is a deep neural network, or DNN, which enables a wearer to have an even better hearing experience than before. He explains that artificial intelligence (AI) is as simple as a thermostat in the refrigerator. It senses when it needs to adjust the temperature and then does so. DNN is a much more sophisticated form of AI. It learns in the same way the human brain does. It's used in a variety of everyday tasks, for example buying something on Amazon. Once you buy a certain item, Amazon will let you know when similar items become available. The general idea of a DNN is that it learns through repetitive action from a collection of samples. In a hearing aid, the DNN is trained with millions of real-life sound scenes – a restaurant, train station or busy street. The DNN learns to identify and balance each sound within it, so the wearer can access the sounds most important to you.The Oticon More was trained with 12 million complex real-life sound, which it then learned to analyze, organize, and balance. This hearing device can utilize the DNN's intelligent capabilities when balancing and prioritizing the sounds that are important to the wearer.The benefit of the DNN is that the wearer's brain has access to the full sound scene, so he/she can hear the person next to him/her, as well as other environmental sounds, all balanced and amplified in a true-to-life way.This is because a DNN provides the brain with more meaningful sound information, which makes sound much clearer and speech easier to follow. In fact, research shows that Oticon More delivers 30 percent more sound to the brain and increase speech understanding by 15 percent.

amazon ai benefits bethlehem beck nazareth hearing loss hearing aids dnn deep neural networks

Logic Programming – O Prolog ainda faz sentido?

Building The Future - AI Portugal Podcast

Play Episode Listen Later Apr 14, 2021 50:46

O episódio de hoje vai se focar num tema um pouco diferente, vamos então nos debruçar sobre Programação Logica e ferramentas como o Prolog. Será que ainda fazem sentido hoje em dia? quando o que ouvimos falar é um mar de artigos sobre Deep Neural Networks, GANs e CNN. Andamos a ver casos mediáticos como o GPT-3 que ultimamente tem levantado muitas questões sobre o que importa mais é realmente entender o contexto e a logica ou uma simples previsão do que seria o próximo passo sem qualquer entendimento logico do contexto ou domínio. Juntem-se a nós para mais uma conversa muito interessante em torno de AI! AI News: A robot that senses hidden objects https://news.mit.edu/2021/robot-senses-hidden-objects-0401 The NVIDIA Grace CPU integrates next-generation Arm Neoverse™ cores https://nvidianews.nvidia.com/news/nvidia-announces-cpu-for-giant-ai-and-high-performance-computing-workloads A.I. researchers urge regulators not to slam brakes on development (cnbc.com) https://www.cnbc.com/2021/03/29/ai-researchers-urge-regulators-not-to-slam-brakes-on-development.html

ai cnn logic programming ainda gpt sentido prolog gans andamos logica juntem deep neural networks

Episode 41: How can deep neural networks reason

Data Science at Home

Play Episode Listen Later Jul 31, 2018 18:03

Today's episode will be about deep learning and reasoning. There has been a lot of discussion about the effectiveness of deep learning models and their capability to generalize, not only across domains but also on data that such models have never seen. But there is a research group from the Department of Computer Science, Duke University that seems to be on something with deep learning and interpretability in computer vision. References Prediction Analysis Lab Duke University https://users.cs.duke.edu/~cynthia/lab.html This looks like that: deep learning for interpretable image recognition https://arxiv.org/abs/1806.10574

duke university computer science deep neural networks

Alain Tapp on How Deep, Deep Learning Really is?

Finding Genius Podcast

Play Episode Listen Later Mar 2, 2017 35:35

Alain Tapp is a veteran in technology, having done his masters in Cryptography, then a PhD in Quantum Computing and is now working on AI & Deep Learning. Alain details how machine learning works, and what are some of the approaches to training machines to "learn" and interpret. From machines understanding metaphors and sarcasm to passing the Turing test, there are still many unreachable areas of natural language processing by machines. Reinforcement Learning and the Deep Neural Networks could be the future of machine learning which could generally make an AI smarter over time. This is a great interview and a must listen for all AI & Machine Learning enthusiasts. Listen, subscribe, and review Future Tech Podcast. Contribute Bitcoin to fuel our interviews and keep us going!

ai phd machine learning alain quantum computing deep learning turing cryptography tapp reinforcement learning deep deep deep neural networks future tech podcast

Podcasts about deep neural networks

Best podcasts about deep neural networks

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

VUX World

Machine Learning Podcast - Jay Shah

Data Science at Home

Mind & Matter

PaperPlayer biorxiv neuroscience

Investorideas -Trading & News

The AI Eye: stock news & deal tracker

Chai Time Data Science

Latest news about deep neural networks

Latest podcast episodes about deep neural networks

505 Bumpers 89

#144 Why is Bayesian Deep Learning so Powerful, with Maurizio Filippone

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734

AI Energy Measurement for Beginners

Copilots for Running Machines? GenAI Mindset and New Industrial Tech Jobs

Why Do Animals Sleep? | Vlad Vyazovskiy | 202

Sleep: Neural Circuits, Orexin/Hypocretin, Hypothalamus, Neuromodulators, Stress & Cortisol, Sleep Drugs & Ultrasound Technology | Luis de Lecea | #168

#107 Amortized Bayesian Inference with Deep Neural Networks, with Marvin Schmitt

Ramblings of a Designer eps. 158 - Diego Sabajo

ECMO PAL: using deep neural networks for survival prediction in venoarterial extracorporeal membrane oxygenation

ACR 2023 Podcast - Day2A

Amr Awadallah, CEO of Vectara and co-founder of Cloudera, discusses the future of AI search

Mitsubishi Scientist: Are Deep Neural Networks Smarter than Second Graders?

FODD 11 - Abhishek Pandey - Machine Learning Blueprints for Pharma Discovery Breakthroughs

Conversations: Eric Topol, MD

193: The potency of rock-physics-guided deep neural networks

The future of human vision

The Importance of Energy Efficiency in AI with Vivienne Sze

The Godfather of AI is Worried About AI

April 25th, 2023 | Apparently everyone's leaking your data

Factorized visual representations in the primate visual system and deep neural networks

Modelling Phenomenological Differences in Aetiologically Distinct Visual Hallucinations Using Deep Neural Networks

Inferring Line-of-Sight Velocities and Doppler Widths from Stokes Profiles of GST NIRIS Using Stacked Deep Neural Networks

Invariance of object detection in untrained deep neural networks

Comparison of visual quantities in untrained deep neural networks

שישה מאמרים שכל דאטא סיינטינסט חייב להכיר עם שקד זיכלינסקי

Energy functions and shortcut learning

DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella

Explainability, Human Aware AI & sentience in large language models | Dr. Subbarao Kambhampati

Xenobots and Critical Learning Periods

Enhancing AI Diagnosis

Episode 10: Deep Neural Networks in Julia with Flux.jl

Season 2 Episode 16: Grace Lindsay - Computational neuroscience and her book, "Models of the Mind"

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers by lifelonglearner, Peter Hase

Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

#13 National Influencers, International Norms

Erik Hoel: Dreams, Sleep, Deep Learning, Evolutionary Function of Fiction & Art | #43

Biological Intelligence and the Limitations of Deep Neural Networks – Intel on AI Season 3, Episode 2

#2: Deep Learning based Recommender Systems with Even Oldridge

CNNs & ViTs (Vision Transfomers) - Comparing the internal structures, Maithra Raghu, Google ​

Learning to Ponder: Memory in Deep Neural Networks with Andrea Banino - #528

Learning the internals of Machine Learning systems and tips for PhD | Maithra Raghu, Google Brain

Accelerating AI Innovation by Optimizing Infrastructure. With Dr. Muthian Sivathanu

ACM RecSys Winning Solution: Benedikt Schifferer, Bo Liu, Chris Deotte, Even Oldridge #136

The Benefits of Deep Neural Networks in Hearing Aids feat. Dr. Douglas L. Beck | Oticon

Logic Programming – O Prolog ainda faz sentido?

Episode 41: How can deep neural networks reason

Alain Tapp on How Deep, Deep Learning Really is?

CNNs & ViTs (Vision Transfomers) - Comparing the internal structures, Maithra Raghu, Google