Podcasts about convolutional

  • 37PODCASTS
  • 76EPISODES
  • 43mAVG DURATION
  • ?INFREQUENT EPISODES
  • Feb 13, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about convolutional

Latest podcast episodes about convolutional

Seismic Soundoff
249: Machine Learning Methods in Geoscience

Seismic Soundoff

Play Episode Listen Later Feb 13, 2025 24:14


“The biggest challenge for geophysicists? Learning machine learning's ‘new language' from the world of statistics.” Machine learning is transforming geoscience, and Gerard Schuster explains how. This conversation explores key ML applications in seismic interpretation, the role of convolutional neural networks in fault detection, and why hands-on labs are essential for mastering these techniques. With real-world examples and insights from his new book, Machine Learning Methods in Geoscience, this episode delivers practical knowledge for integrating ML into geophysics. KEY TAKEAWAYS > Why ML matters for geoscientists – The demand for ML skills is growing, and Jerry shares how this shift shapes education and careers. > CNNs in action – Convolutional neural networks are used to detect rock cracks in Saudi Arabia through drone imagery. > Transformers vs. traditional neural networks – Transformers process seismic data differently by capturing long-range dependencies, offering new advantages. NEXT STEP Explore Machine Learning Methods in Geoscience by Gerard Schuster, featuring hands-on MATLAB and Colab labs. Get the book and start applying ML techniques today! https://library.seg.org/doi/epdf/10.1190/1.9781560804048.fm TEXT A FRIEND These are great insights on how ML is actually being used in seismic work, not just theory. https://seg.org/podcasts/episode-249-machine-learning-methods-in-geoscience GUEST BIO Gerard Schuster has an M.S. (1982) and a Ph.D. (1984) from Columbia University and was a postdoctoral researcher there from 1984 to 1985. From 1985 to 2009, he was a professor of geophysics at the University of Utah and became a professor of geophysics at KAUST (2009–2021). He is currently a research professor at the University of Utah. He received several teaching and research awards while at the University of Utah. He was editor of GEOPHYSICS 2004–2005 and was awarded SEG's Virgil Kauffman Gold Medal in 2010 for his work in seismic interferometry. His previous books are Seismic Interferometry (2009, Cambridge Press) and Seismic Inversion (2017, SEG). LINKS * Buy the Print Book at https://seg.org/shop/product/?id=fe5a3cd3-77b2-ef11-b8e8-6045bda82e05 * Visit https://seg.org/podcasts/episode-249-machine-learning-methods-in-geoscience for the full guest bios and show notes. CALL FOR ABSTRACTS Technical Program Chairs Yingcai Zheng and Molly Turko invite you to submit your best work. This year, we're fostering deeper collaboration between SEG, AAPG, and SEPM. Focus on regional challenges and how integrated geoscience can unlock solutions. Submit short or expanded abstracts for oral and poster presentations. The Call for Abstracts is open and closes on 15 March at 5:00 PM CT. Don't miss this opportunity to share your research and connect with the broader geoscience community at https://www.imageevent.org/. SHOW CREDITS Andrew Geary at TreasureMint hosted, edited, and produced this episode. The SEG podcast team comprises Jennifer Cobb, Kathy Gamble, and Ally McGinnis. If you have episode ideas or feedback for the show or want to sponsor a future episode, email the show at podcast@seg.org.

Scope It Out with Dr. Tim Smith
Episode 90: Enhancing nasal endoscopy: Classification, detection, and segmentation of anatomic landmarks using a convolutional neural network

Scope It Out with Dr. Tim Smith

Play Episode Listen Later Jul 10, 2024 24:16


In this episode, host Dr. Sarah Wise speaks with Dr. Edward McCoul. They discuss the recently published article: Enhancing nasal endoscopy: Classification, detection, and segmentation of anatomic landmarks using a convolutional neural network. Read the full open access article in the International Forum of Allergy and Rhinology. Listen and subscribe for free to Scope It […]

Rocking Reservoir Waves
Integrating Rock Physics Model with Convolutional Neural Network

Rocking Reservoir Waves

Play Episode Listen Later Jan 31, 2024 13:45


In this new podcast episode, let's explore the details of the Convolutional Neural Network (CNN) used in our workflow. #ReservoirExpert Jyoti will discuss how we utilize a synthetic catalog for CNN training and incorporate seismic and well data during transfer learning to enhance the training process. This episode aims to help you understand the value addition from the technology advancements, particularly in the exploration field where well control is limited.

MULTIVERSES
23| Paulina Sliwa — Moral philosophy as puzzles of daily life

MULTIVERSES

Play Episode Listen Later Jan 18, 2024 71:58


Why do men do less housework? What happens when an apology is offered? What are we looking for when we ask for advice?These are the sorts of problems drawn from everyday experience that Paulina Sliwa intends to resolve and in doing so make sense of the ways we negotiate blame and responsibility.Paulina is a Professor of Moral & Political Philosophy at the University of Vienna. She looks carefully at evidence accessible to us all — daily conversations, testimony from shows like This American Life, and our own perceptions — and uses these to unravel our moral practices. The results are sometimes surprising yet always grounded. For example, Paulina argues that remorse is not an essential feature of an apology, nor is accepting that behavior was unjustified.This is illuminating for its insights into moral problems, but I also really enjoyed seeing how Paulina thinks, it's a wonderful example of philosophical tools at work.Links Paulina's website with links to many of her papers Multiverses.xyzMilestones(0:00) Into(3:00) Start of conversation: grand systems vs ordinary practices of morality(5:30) Philosophy and evidence(6:39) Apologies(8:40) Anne of Green Gables: an overblown apology(10:50) Remorse is not an essential feature of apologies(12:00) Apologies involve accepting some blame(15:30) Why apology is not saying I won't do it again(17:17) Essential vs non-essential features of apologies(18:12) Apologies occur in many different shapes, is a unified account possible?(20:00) Moral footprints(24:10) Apologies and politeness(26:20) Tiny apologies as a commitment to moral norms(29:50) Moral advice — verdictive vs hermeneutic (making sense)(33:30) Moral advice doesn't need to get us to the right answer but it should get us closer(36:30) Perspectives, affordances and options(38:40) Perspectives vs facts(46:45) Housework: Gendered Domestic Affordance Perception(49:40) Evidence that affordances are directly perceived (and not inferred)(52:00) Convolutional neural networks as a model of perception(53:00) Environmental dependency syndrome(54:30) Perceptions are not fixed(59:30) Perception is not a transparent window on reality(1:01:00) Tools of a philosopher(1:03:20) A Terribly Serious Adventure - Philosophy at Oxford 1900-60 — Nikhil Krishnan(1:04:50) Philosophy as continuous with science(1:06:17) Philosophy is not a neutral enterprise:(1:09:00) Santa: Read letters!(1:10:10) Apologise less

A Beginner's Guide to AI
Racing Toward Utopia or Dystopia? The High-Octane Ideas of Effective Accelerationism

A Beginner's Guide to AI

Play Episode Listen Later Sep 5, 2023 11:43


Today's episode explored the futuristic ideology of Effective Accelerationism. This movement advocates hastening progress towards benevolent artificial general intelligence that could help humanity flourish. We defined key terms like AGI and examined the potential benefits as well as risks of rushing ahead with advanced AI without proper safeguards. Through a case study on malaria eradication, we saw how e/accs believe superhuman intelligence could solve global problems like disease. Critics caution that accelerating uncontrolled AGI could backfire catastrophically. But e/accs contend careful, managed progress is humanity's best shot at utopia. This bold, divisive ideology compels us to scrutinize assumptions about technology, progress and the future. What do you think - should we accelerate or apply the brakes when it comes to AGI? Share your perspective with us. This podcast was generated with the help of artificial intelligence. We do fact check with human eyes, but there might still be hallucinations in the output. Music credit: "Modern Situations by Unicorn Heads" --- THE CONTENT OF THE EPISODE Demystifying the Inner Workings of Deep Learning Introducing Artificial Neural Networks In today's episode, we'll be diving into the fascinating world of deep learning. This powerful subset of machine learning relies on artificial neural networks, which are inspired by the biological neural networks in our brains. These artificial neural nets are composed of layers of simple computing nodes that pass information to each other, allowing them to identify increasingly complex features in data. How Deep Neural Networks Learn Deep learning uses multi-layered artificial neural networks to recognize intricate patterns in data with human-like accuracy. Each layer identifies increasingly complex features, allowing networks with many layers to model very complex concepts. Nodes are interconnected using weights and biases. Tweaking these parameters through backpropagation allows the network to learn. Different Neural Network Architectures Different network architectures like convolutional or recurrent neural nets are optimized for various data types like images or text. Convolutional neural networks excel at processing grid-like image data. Recurrent neural networks are ideal for sequential data like text or audio. Real-World Applications Deep learning is driving breakthroughs in self-driving cars, medical imaging, natural language processing, and more. For example, deep learning is being used to analyze complex medical images and reduce diagnostic errors, like detecting breast cancer in mammograms and lung tumors in CT scans. Key Takeaways In this episode, we aimed to demystify the core concepts so you have a mental model for how deep learning algorithms work. Now that you have a solid base of knowledge, we can dive further into specific applications in future episodes. We hope reviewing key ideas helps reinforce today's lesson. Let us know how you scored on the interactive trivia challenge!

Papers Read on AI
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Papers Read on AI

Play Episode Listen Later Aug 20, 2023 44:47


Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a shared embedding space, which bridges the gap between closed-vocabulary and open-vocabulary recognition. Hence, existing methods often adopt a two-stage framework to tackle the problem, where the inputs first go through a mask generator and then through the CLIP model along with the predicted masks. This process involves extracting features from images multiple times, which can be ineffective and inefficient. By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining. When training on COCO panoptic data only and testing in a zero-shot manner, FC-CLIP achieve 26.8 PQ, 16.8 AP, and 34.1 mIoU on ADE20K, 18.2 PQ, 27.9 mIoU on Mapillary Vistas, 44.0 PQ, 26.8 AP, 56.2 mIoU on Cityscapes, outperforming the prior art by +4.2 PQ, +2.4 AP, +4.2 mIoU on ADE20K, +4.0 PQ on Mapillary Vistas and +20.1 PQ on Cityscapes, respectively. Additionally, the training and testing time of FC-CLIP is 7.5x and 6.6x significantly faster than the same prior art, while using 5.9x fewer parameters. FC-CLIP also sets a new state-of-the-art performance across various open-vocabulary semantic segmentation datasets. Code at https://github.com/bytedance/fc-clip 2023: Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen https://arxiv.org/pdf/2308.02487v1.pdf

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion
AI Today Podcast: AI Glossary Series – Convolutional Neural Network (CNN)

AI Today Podcast: Artificial Intelligence Insights, Experts, and Opinion

Play Episode Listen Later May 24, 2023 14:05


In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the term Convolutional Neural Network (CNN), explain how these terms relate to AI and why it's important to know about them. Want to dive deeper into an understanding of artificial intelligence, machine learning, or big data concepts? Want to learn how to apply AI and data using hands-on approaches and the latest technologies? Continue reading AI Today Podcast: AI Glossary Series – Convolutional Neural Network (CNN) at AI & Data Today.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later May 20, 2023 66:43


We are excited to be the first podcast in the world to release an in-depth interview on the new SOTA in commercially licensed open source models - MosiacML MPT-7B!The Latent Space crew will be at the NYC Lux AI Summit next week, and have two meetups in June. As usual, all events are on the Community page! We are also inviting beta testers for the upcoming AI for Engineers course. See you soon!One of GPT3's biggest limitations is context length - you can only send it up to 4000 tokens (3k words, 6 pages) before it throws a hard error, requiring you to bring in LangChain and other retrieval techniques to process long documents and prompts. But MosaicML recently open sourced MPT-7B, the newest addition to their Foundation Series, with context length going up to 84,000 tokens (63k words, 126 pages):This transformer model, trained from scratch on 1 trillion tokens of text and code (compared to 300B for Pythia and OpenLLaMA, and 800B for StableLM), matches the quality of LLaMA-7B. It was trained on the MosaicML platform in 9.5 days on 440 GPUs with no human intervention, costing approximately $200,000. Unlike many open models, MPT-7B is licensed for commercial use and it's optimized for fast training and inference through FlashAttention and FasterTransformer.They also released 3 finetuned models starting from the base MPT-7B: * MPT-7B-Instruct: finetuned on dolly_hhrlhf, a dataset built on top of dolly-5k (see our Dolly episode for more details). * MPT-7B-Chat: finetuned on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets.* MPT-7B-StoryWriter-65k+: it was finetuned with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. While 65k is the advertised size, the team has gotten up to 84k tokens in response when running on a single node A100-80GB GPUs. ALiBi is the dark magic that makes this possible. Turns out The Great Gatsby is only about 68k tokens, so the team used the model to create new epilogues for it!On top of the model checkpoints, the team also open-sourced the entire codebase for pretraining, finetuning, and evaluating MPT via their new MosaicML LLM Foundry. The table we showed above was created using LLM Foundry in-context-learning eval framework itself!In this episode, we chatted with the leads of MPT-7B at Mosaic: Jonathan Frankle, Chief Scientist, and Abhinav Venigalla, Research Scientist who spearheaded the MPT-7B training run. We talked about some of the innovations they've brought into the training process to remove the need for 2am on-call PagerDutys, why the LLM dataset mix is such an important yet dark art, and why some of the traditional multiple-choice benchmarks might not be very helpful for the type of technology we are building.Show Notes* Introducing MPT-7B* Cerebras* Lottery Ticket Hypothesis* Hazy Research* ALiBi* Flash Attention* FasterTransformer* List of naughty words for C4 https://twitter.com/code_star/status/1661386844250963972* What is Sparsity?* Hungry Hungry Hippos* BF16 FPp.s. yes, MPT-7B really is codenamed LLongboi!Timestamps* Introductions [00:00:00]* Intro to Mosaic [00:03:20]* Training and Creating the Models [00:05:45]* Data Choices and the Importance of Repetition [00:08:45]* The Central Question: What Mix of Data Sets Should You Use? [00:10:00]* Evaluation Challenges of LLMs [0:13:00]* Flash Attention [00:16:00]* Fine-tuning for Creativity [00:19:50]* Open Source Licenses and Ethical Considerations [00:23:00]* Training Stability Enhancement [00:25:15]* Data Readiness & Training Preparation [00:30:00]* Dynamic Real-time Model Evaluation [00:34:00]* Open Science for Affordable AI Research [00:36:00]* The Open Approach [00:40:15]* The Future of Mosaic [00:44:11]* Speed and Efficiency [00:48:01]* Trends and Transformers [00:54:00]* Lightning Round and Closing [1:00:55]TranscriptAlessio: [00:00:00] Hey everyone. Welcome to the Latent Space podcast. This is Alessio partner and CTO-in-Residence at Decibel Partners. I'm joined by my co-host, Swyx, writer and editor of Latent Space.Swyx: Hey, and today we have Jonathan and Abhi from Mosaic ML. Welcome to our studio.Jonathan: Guys thank you so much for having us. Thanks so much.Swyx: How's it feel?Jonathan: Honestly, I've been doing a lot of podcasts during the pandemic, and it has not been the same.Swyx: No, not the same actually. So you have on your bio that you're primarily based in Boston,Jonathan: New York. New York, yeah. My Twitter bio was a probability distribution over locations.Swyx: Exactly, exactly. So I DMd you because I was obviously very interested in MPT-7B and DMd you, I was like, for the 0.2% of the time that you're in San Francisco, can you come please come to a podcast studio and you're like, I'm there next week.Jonathan: Yeah, it worked out perfectly. Swyx: We're really lucky to have you, I'll read off a few intros that people should know about you and then you can fill in the blanks.So Jonathan, you did your BS and MS at Princeton in programming languages and then found your way into ML for your PhD at MiT where you made a real splash with the lottery ticket hypothesis in 2018, which people can check up on. I think you've done a few podcasts about it over the years, which has been highly influential, and we'll talk about sparse models at Mosaic. You have also had some side [00:01:30] quest. You taught programming for lawyers and you did some law and privacy stuff in, in DC and also did some cryptography stuff. Um, and you've been an assistant professor at Harvard before earning your PhD.Jonathan:  I've yet to start.Swyx: You, you yet to start. Okay. But you just got your PhD.Jonathan:. I technically just got my PhD. I was at Mosaic which delayed my defense by about two years. It was, I was at 99% done for two years. Got the job at Harvard, Mosaic started, and I had better things to do than write my dissertation for two years. Swyx: You know, you know, this is very out of order.Jonathan: Like, oh, completely out of order, completely backwards. Go talk to my advisor about that. He's also an advisor at Mosaic and has been from the beginning. And, you know, go talk to him about finishing on time.Swyx: Great, great, great. And just to fill it out, Abhi, you did your BS and MS and MIT, you were a researcher at Cerebras, and you're now a research scientist at Mosaic. Just before we go into Mosaic stuff, I'm actually very curious about Cereus and, uh, just that, that space in general. Um, what are they doing that people should know about?Abhinav: Yeah, absolutely. Um, I think the biggest thing about CEREUS is that they're really building, you know, kind of the NextGen computing platform beyond, like GPUs.Um, they're trying to build a system that uses an entire wafer, you know, rather than cutting up a wafer into smaller chips and trying to train a model on that entire system, or actually more recently on many such wafers. Um, so it's, and it's really extraordinary. I think it's like the first time ever that kind of wafer scale computing has ever really worked. And so it's a really exciting time to be there, trying to figure out how we can map ML workloads to work, um, on a much, much bigger chip.Swyx: And do you use like [00:03:00] a different programming language or framework to do that? Or is that like..Abhinav: Yeah, so I mean, things have changed a bit since I was there.I think, um, you can actually run just normal tensor flow and pie torch on there. Um, so they've built a kind of software stack that compiles it down. So it actually just kind of works naturally. But yeah.Jonathan : Compiled versions of Python is a hot topic at the moment with Mojo as well. Swyx: And then Mosaic, you, you spearheaded the MPT-7B effort.INTRO TO MOSAIC [00:03:20]Abhinav: Uh, yeah. Yeah, so it's kind of like, it's been maybe six months, 12 months in the making. We kind of started working on LMs sort of back in the summer of last year. Um, and then we came with this blog post where we kind of profiled a lot of LMs and saw, hey, the cost of training is actually a lot lower than what people might think.Um, and then since then, you know, being inspired by kind of, you know, meta's release, so the LLaMA models and lots of other open source work, we kind of started working towards, well, what if we were to release a really good kind of 7 billion parameter model? And that's what MPT is. Alessio:You know, we mentioned some of the podcasts you had done, Jonathan, I think in one of them you mentioned Mosaic was not planning on building a  model and releasing and obviously you eventually did. So what are some of the things that got you there that maybe obviously LLaMA you mentioned was an inspiration. You now have both the training and like inference products that you offer. Was this more of a research challenge in a way, uh, that you wanted to do?Or how did the idea come to be?Jonathan: I think there were a couple of things. So we still don't have a first class model. We're not an open AI where, you know, our businesses come to use our one great model. Our business is built around customers creating their own models. But at the end of the day, if customers are gonna create their own models, we have to have the tools to help them do that, and to have the tools to help them do that and know that they work we have to create our own models to start. We have to know that we can do something great if customers are gonna do something great. And one too many people may have challenged me on Twitter about the fact that, you know, mosaic claims all these amazing numbers, but, you know, I believe not to, you know, call out Ross Whiteman here, but, you know, I believe he said at some point, you know, show us the pudding.Um, and so Ross, you know, please let me know how the pudding tastes. But in all seriousness, like I think there is something, this is a demo in some sense. This is to say we did this in 9.5 days for a really reasonable cost, straight through 200, an intervention. 200 K. Yep. Um, you can do this too.Swyx: Uh, and just to reference the numbers that you're putting out, this is the, the last year you were making a lot of noise for trading GPT 3 under 450 K, which is your, your initial estimate.Um, and then it went down to a 100 K and stable diffusion 160 k going down to less than 50 K as well.Jonathan: So I will be careful about that 100 K number. That's certainly the challenge I've given Abhi to hit. Oh, I wouldn't make the promise that we've hit yet, but you know, it's certainly a target that we have.And I, you know, Abhi may kill me for saying this. I don't think it's crazy. TRAINING AND CREATING THE MODELS [00:05:45] Swyx: So we definitely want to get into like estimation math, right? Like what, what needs to happen for those big order magnitude changes to in, in infrastructure costs. But, uh, let's kind of stick to the MPT-7B story. Yeah. Tell us everything.Like you have, uh, three different models. One of them. State of the art essentially on context length. Let's talk about the process of training them, the, uh, the decisions that you made. Um, I can go into, you know, individual details, but I just wanna let you let you rip.Abhinav: Yeah, so I mean, I think, uh, we started off with the base model, which is kind of for all practical purposes, a recreation of LLaMA 7B.Um, so it's a 7 billion perimeter model trained on the trillion tokens. Um, and our goal was like, you know, we should do it efficiently. We should be able to do it like, kind of hands free so we don't have to babysit the runs as they're doing them. And it could be kind of a, a launching point for these fine tune models and those fine tune models, you know, on, on the one hand they're kind of really fun for the community, like the story writer model, which has like a 65,000 length context window and you can even kind of extrapolate beyond that. Um, but they're, they're also kind of just tr inspirations really. So you could kind of start with an MPT-7B base and then build your own custom, you know, downstream. If you want a long context code model, you could do that with our platform. If you wanted one that was for a particular language, you could do that too.But yeah, so we picked kind of the three variance chat and instruct and story writer just kind of like inspirations looking at what people were doing in the community today. Yeah. Alessio: And what's the beginning of the math to come up with? You know, how many tokens you wanna turn it on? How many parameters do you want in a bottle? 7 billion and 30 billion seem to be kind of like two of the magic numbers going around right now. Abhinav: Yeah, definitely. Definitely. Yeah, I think like there's sort of these scaling laws which kind of tell you how to best spend your training compute if that's all you cared about. So if you wanna spend $200,000 exactly in the most efficient way, there'd be a recipe for doing that.Um, and that we usually go by the Chinchilla laws. Now for these models, we actually didn't quite do that because we wanted to make sure that people could actually run these at home and that they [00:07:30] were good for inference. So we trained them kind of beyond those chinchilla points so that we're almost over-training them.I think there's like a joke going on online that they're like long boy and that that came up internally because we were training them for really, really long durations. So that 7B model, the chinchilla point might be 140 billion tokens. Instead, we trained a trillion, so almost seven times longer than you normally would.Swyx: So longboi was the code name. So is it, is it the trading method? Is it the scaling law that you're trying to coin or is it the code name for the 64 billion?Jonathan: Uh, 64. It was just an internal joke for the, for training on way more tokens than you would via chinchilla. Okay. Um, we can coin it long boy and it, it really stuck, but just to, you know, long boys filled with two ELs at the beginning.Yeah. Cause you know, we wanted the lLLaMA thing in there as well. Jonathan: Yeah, yeah, yeah. Our darn CEO we have to rein him in that guy, you know, you can't, yeah. I'm gonna take away his Twitter password at some point. Um, but you know, he had to let that one out publicly. And then I believe there was a YouTube video where someone happened to see it mentioned before the model came out and called it the Long G boy or something like that.Like, so you know, now it's out there in the world. It's out there. It's like Sydnee can't put it back inSwyx: There's a beautiful picture which I think Naveen tweeted out, which, um, shows a long boy on a whiteboard.Jonathan: That was the origin of Long Boy. In fact, the legs of the lLLaMA were the two Ls and the long boy.DATA CHOICES AND THE IMPORTANCE OF REPETITION [00:08:45]Swyx: Well, talk to me about your data choices, right? Like this is your passion project. Like what can you tell us about it?Jonathan: Yeah, I think Abhi wanted to kill me by the end for trying to use all the GPUs on data and none of them on actually training the model. Um, at the end of the day, We know that you need to train these models and [00:09:00] lots of data, but there are a bunch of things we don't know.Number one is what kinds of different data sources matter. The other is how much does repetition really matter? And really kind of repetition can be broken down into how much does quality versus quantity matter. Suppose I had the world's best 10 billion tokens of data. Would it be better to train on that a hundred times or better to train on a trillion tokens of low quality, fresh data?And obviously there's, there's a middle point in between. That's probably the sweet spot. But how do you even know what good quality data is? And. So, yeah, this is, nobody knows, and I think the more time I spent, we have a whole data team, so me and several other people, the more time that we spent on this, you know, I came away thinking, gosh, we know nothing.Gosh, if I were back in academia right now, I would definitely go and, you know, write a paper about this because I have no idea what's going on.Swyx: You would write a paper about it. I'm interested in such a paper. I haven't come across any that exists. Could you frame the central question of such a paper?THE CENTRAL QUESTION: WHAT MIX OF DATA SETS SHOULD YOU USE? [00:10:00]Jonathan: Yeah. The central question is what mix of data sets should you use? Okay. Actually I've, you know, you had mentioned my law school stuff. I went back to Georgetown Law where I used to teach, um, in the midst of creating this model, and I actually sat down with a class of law students and asked them, I gave them our exact data sets, our data mixes, um, like how many tokens we had, and I said, Create the best data set for your model.Knowing they knew nothing about large language models, they just know that data goes in and it's going to affect the behavior. Um, and I was like, create a mix and they basically covered all the different trade-offs. Um, you probably want a lot of English language [00:10:30] text to start with. You get that from the web, but do you want it to be multilingual?If so, you're gonna have a lot less English text. Maybe it'll be worse. Do you wanna have code in there? There are all these beliefs that code leads to models being better at logical reasoning, of which I've seen zero evidence. Rep. It's not, um, I mean, really made a great code model, but code models leading to better chain of thought reasoning on the part of language or code being in the training set leading to better chain of thought reasoning.People claim this all the time, but I've still never seen any real evidence beyond that. You know, one of the generations of the GPT three model started supposedly from Code Da Vinci. Yes. And so there's a belief that, you know, maybe that helped. But again, no evidence. You know, there's a belief that spending a lot of time on good sources like Wikipedia is good for the model.Again, no evidence. At the end of the day, we tried a bunch of different data mixes and the answer was that there are some that are better or worse than others. We did find that the pile, for example, was a really solid data mix, but you know, there were stronger data mixes by our evaluation metrics. And I'll get back to the evaluation question in a minute cuz that's a really important one.This data set called c4, which is what the original T five model was trained on, is weirdly good. And everybody, when I posted on this on Twitter, like Stella Beaterman from Luther mentioned this, I think someone else mentioned this as well. C4 does really well in the metrics and we have no idea why we de-duplicated it against our evaluation set.So it's not like it memorized the data, it is just one web scrape from 2019. If you actually look at the T five paper and see how it was pre-processed, it looks very silly. Mm-hmm. They removed anything that had the word JavaScript in it because they didn't want to get like no JavaScript [00:12:00] warnings. They removed anything with curly braces cuz they didn't wanna get JavaScript in it.They looked at this list of bad words, um, and removed anything that had those bad words. If you actually look at the list of bad words, words like gay are on that list. And so there's, you know, it is a very problematic, you know, list of words, but that was the cleaning that leads to a data set that seems to be unbeatable.So that to me says that we know nothing about data. We, in fact used a data set called mc four as well, which is they supposedly did the same pre-processing of C4 just on more web calls. The English portion is much worse than C4 for reasons that completely escape us. So in the midst of all that, Basically I set two criteria.One was I wanted to be at least as good as mc four English, like make sure that we're not making things actively worse. And mc four English is a nice step up over other stuff that's out there. And two was to go all in on diversity after that, making sure that we had some code, we had some scientific papers, we had Wikipedia, because people are gonna use this model for all sorts of different purposes.But I think the most important thing, and I'm guessing abhi had a million opinions on this, is you're only as good as your evaluation. And we don't know how to evaluate models for the kind of generation we ask them to do. So past a certain point, you have to kinda shrug and say, well, my evaluation's not even measuring what I care about.Mm-hmm. So let me just make reasonable choices. EVALUATION CHALLENGES OF LLMs [0:13:00]Swyx: So you're saying MMLU, big bench, that kind of stuff is not. Convincing for youJonathan: A lot of this stuff is you've got two kinds of tasks. Some of these are more of multiple choice style tasks where there is a right answer. Um, either you ask the model to spit out A, B, C, or D or you know, and if you're more [00:13:30] sophisticated, you look at the perplexity of each possible answer and pick the one that the model is most likely to generate.But we don't ask these models to do multiple choice questions. We ask them to do open-ended generation. There are also open-ended generation tasks like summarization. You compare using things like a blue score or a rouge score, which are known to be very bad ways of comparing text. At the end of the day, there are a lot of great summaries of a paper.There are a lot of great ways to do open form generation, and so humans are, to some extent, the gold standard. Humans are very expensive. It turns out we can't put them into our eval pipeline and just have the humans look at our model every, you know, 10 minutes? Not yet. Not yet. Maybe soon. Um, are you volunteering Abhi?Abhinav: I, I, I just know we have a great eval team who's, uh, who's helping us build new metrics. So if they're listening,Jonathan:  But it's, you know, evaluation of large language models is incredibly hard and I don't think any of these metrics really truly capture. What we expect from the models in practice.Swyx: Yeah. And we might draw wrong conclusions.There's been a debate recently about the emergence phenomenon, whether or not it's a mirage, right? I don't know if you guys have opinions about that process. Abhinav: Yeah, I think I've seen like this paper and all and all, even just kind of plots from different people where like, well maybe it's just a artifact of power, like log scaling or metrics or, you know, we're meshing accuracy, which is this a very like harsh zero one thing.Yeah. Rather than kind of something more continuous. But yeah, similar to what Jonathan was saying about evals. Like there there's one issue of like you just like our diversity of eval metrics, like when we put these models up, even like the chat ones, the instruct ones, people are using 'em for such a variety of tasks.There's just almost no way we get ahead of time, like measuring individual dimensions. And then also particularly like, you know, at the 7B scale, [00:15:00] um, these models still are not super great yet at the really hard tasks, like some of the hardest tasks in MMLU and stuff. So sometimes they're barely scoring like the above kind of random chance, you know, like on really, really hard tasks.So potentially as we. You know, aim for higher and higher quality models. Some of these things will be more useful to us. But we kind of had to develop MPT 7B kind of flying a little bit blind on, on what we knew it was coming out and just going off of like, you know, a small set of common sensor reasoning tasks.And of course, you know, just comparing, you know, those metrics versus other open source models. Alessio: I think fast training in inference was like one of the goals, right? So there's always the trade off between doing the hardest thing and like. Doing all the other things quickly.Abhinav: Yeah, absolutely. Yeah, I mean, I think like, you know, even at the 7B scale, you know, uh, people are trying to run these things on CPUs at home.You know, people are trying to port these to their phones, basically prioritizing the fact that the small scale would lead to our adoption. That was like a big, um, big thing going on. Alessio: Yeah. and you mentioned, um, flash attention and faster transformer as like two of the core things. Can you maybe explain some of the benefits and maybe why other models don't use it?FLASH ATTENTION [00:16:00]Abhinav: Yeah, absolutely. So flash attention is this basically faster implementation of full attention. Um, it's like a mathematical equivalent developed by like actually some of our collaborators, uh, at Stanford. Uh, the hazy research. Hazy research, yeah, exactly.Jonathan: What is, what, what, what's the name hazy research mean?Abhinav: I actually have no idea.Swyx: I have no clue. All these labs have fun names. I always like the stories behind them.Abhinav: Yeah, absolutely. We really, really liked flash attention. We, I think, had to integrate into repo even as [00:16:30] as early as September of last year. And it really just helps, you know, with training speed and also inference speed and we kind of bake that into model architecture.And this is kind of unique amongst all the other hugging face models you see out there. So ours actually, you can toggle between normal torch attention, which will work anywhere and flash attention, which will work on GPUs right out of the box. And that way I think you get almost like a 2x speed up at training time and somewhere between like 50% to a hundred percent speed up at inference time as well.So again, this is just like, we really, really wanted people to use these and like, feel like an improvement and we, we have the team to, to help deliver that. Swyx: Another part, um, of your choices was alibi position, encodings, which people are very interested in, maybe a lot of people just, uh, to sort of take in, in coatings as, as a given.But there's actually a lot of active research and honestly, it's a lot of, um, it's very opaque as well. Like people don't know how to evaluate encodings, including position encodings, but may, may, could you explain, um, alibi and, um, your choice?Abhinav: Yeah, for sure. The alibi and uh, kind of flash attention thing all kind of goes together in interesting ways.And even with training stability too. What alibi does really is that it eliminates the need to have positional embeddings in your model. Where previously, if you're a token position one, you have a particular embedding that you add, and you can't really go beyond your max position, which usually is like about 2000.With alibies, they get rid of that. Instead, just add a bias to the attention map itself. That's kind of like this slope. And if at inference time you wanna go much, much larger, they just kind of stretch that slope out to a longer, longer number of positions. And because the slope is kind of continuous and you can interpret it, it all works out now.Now one of [00:18:00] the, the funny things we found is like with flash attention, it saved so much memory and like improved performance so much that even as early as I kind of last year, like we were profiling models with, with very long context lines up to like, you know, the 65 k that you seen in release, we just never really got around to using it cuz we didn't really know what we might use it for.And also it's very hard to train stably. So we started experimenting with alibi integration, then we suddenly found that, oh wow, stability improves dramatically and now we can actually work together with alibi in a long context lens. That's how we got to like our story writer model where we can stably train these models out to very, very long context lenses and, and use them performantly.Jonathan: Yeah.Swyx: And it's also why you don't have a firm number. Most people now have a firm number on the context line. Now you're just like, eh, 65 to 85Abhinav: Oh yeah, there's, there's a, there's a big age to be 64 K or 65 k. 65 k plus.Swyx: Just do powers of twos. So 64 isn't, you know. Jonathan: Right, right. Yeah. Yeah. But we could, I mean, technically the context length is infinite.If you give me enough memory, um, you know, we can just keep going forever. We had a debate over what number to say is the longest that we could handle. We picked 84 cakes. It's the longest I expect people to see easily in practice. But, you know, we played around for even longer than that and I don't see why we couldn't go longer.Swyx: Yeah. Um, and so for those who haven't read the blog posts, you put the Great Gatsby in there and, uh, asked it to write an epilogue, which seemed pretty impressive.Jonathan: Yeah. There are a bunch of epilogues floating around internally at Mosaic. Yeah. That wasn't my favorite. I think we all have our own favorites.Yeah. But there are a bunch of really, really good ones. There was one where, you know, it's Gatsby's funeral and then Nick starts talking to Gatsby's Ghost, and Gatsby's father shows up and, you know, then he's [00:19:30] at the police station with Tom. It was very plot heavy, like this is what comes next. And a bunch of that were just very Fitzgerald-esque, like, you know, beautiful writing.Um, but it was cool to just see that Wow, the model seemed to actually be working with. You know, all this input. Yeah, yeah. Like it's, it's exciting. You can think of a lot of things you could do with that kind of context length.FINE-TUNING FOR CREATIVITY [00:19:50]Swyx: Is there a trick to fine tuning for a creative task rather than, um, factual task?Jonathan: I don't know what that is, but probably, yeah, I think, you know, the person, um, Alex who did this, he did fine tune the model explicitly on books. The goal was to try to get a model that was really a story writer. But, you know, beyond that, I'm not entirely sure. Actually, it's a great question. Well, no, I'll ask you back.How would you measure that? Swyx: Uh, God, human feedback is the solve to all things. Um, I think there is a labeling question, right? Uh, in computer vision, we had a really, really good episode with Robo Flow on the segment. Anything model where you, you actually start human feedback on like very, I think it's something like 0.5% of the, the overall, uh, final, uh, uh, labels that you had.But then you sort augment them and then you, you fully automate them, um, which I think could be applied to text. It seems intuitive and probably people like snorkel have already raised ahead on this stuff, but I just haven't seen this applied in the language domain yet.Jonathan: It, I mean there are a lot of things that seem like they make a lot of sense in machine learning that never work and a lot of things that make zero sense that seem to work.So, you know, I've given up trying to even predict. Yeah, yeah. Until I see the data or try it, I just kind shg my shoulders and you know, you hope for the best. Bring data or else, right? Yeah, [00:21:00] exactly. Yeah, yeah, yeah.Alessio: The fine tuning of books. Books three is like one of the big data sets and there was the whole.Twitter thing about trade comments and like, you know, you know, I used to be a community moderator@agenius.com and we've run into a lot of things is, well, if you're explaining lyrics, do you have the right to redistribute the lyrics? I know you ended up changing the license on the model from a commercial use Permitted.Swyx: Yeah let's let them. I'm not sure they did. Jonathan: So we flipped it for about a couple hours. Swyx: Um, okay. Can we, can we introduce the story from the start Just for people who are under the loop. Jonathan: Yeah. So I can tell the story very simply. So, you know, the book three data set does contain a lot of books. And it is, you know, as I discovered, um, it is a data set that provokes very strong feelings from a lot of folks.Um, that was one, one guy from one person in particular, in fact. Um, and that's about it. But it turns out one person who wants a lot of attention can, you know, get enough attention that we're talking about it now. And so we had a, we had a discussion internally after that conversation and we talked about flipping the license and, you know, very late at night I thought, you know, maybe it's a good thing to do.And decided, you know, actually probably better to just, you know, Stan Pat's license is still Apache too. And one of the conversations we had was kind of, we hadn't thought about this cuz we had our heads down, but the Hollywood writer Strike took place basically the moment we released the model. Mm-hmm.Um, we were releasing a model that could do AI generated creative content. And that is one of the big sticking points during the strike. Oh, the optics are not good. So the optics aren't good and that's not what we want to convey. This is really, this is a demo of the ability to do really long sequence lengths and.Boy, you know, [00:22:30] that's, that's not timing that we appreciated. And so we talked a lot internally that night about like, oh, we've had time to read the news. We've had time to take a breath. We don't really love this. Came to the conclusion that it's better to just leave it as it is now and learn the lesson for the future.But certainly that was one of my takeaways is this stuff, you know, there's a societal context around this that it's easy to forget when you're in the trenches just trying to get the model to train. And you know, in hindsight, you know, I might've gone with a different thing than a story writer. I might've gone with, you know, coder because we seem to have no problem putting programmers out of work with these models.Swyx: Oh yeah. Please, please, you know, take away this stuff from me.OPEN SOURCE LICENSES AND ETHICAL CONSIDERATIONS [00:23:00]Jonathan: Right. You know, so it's, I think, you know, really. The copyright concerns I leave to the lawyers. Um, that's really, if I learned one thing teaching at a law school, it was that I'm not a lawyer and all this stuff is a little complicated, especially open source licenses were not designed for this kind of world.They were designed for a world of forcing people to be more open, not forcing people to be more closed. And I think, you know, that was part of the impetus here, was to try to use licenses to make things more closed. Um, which is, I think, against the grain of the open source ethos. So that struck me as a little bit strange, but I think the most important part is, you know, we wanna be thoughtful and we wanna do the right thing.And in that case, you know, I hope with all that interesting licensing fund you saw, we're trying to be really thoughtful about this and it's hard. I learned a lot from that experience. Swyx: There's also, I think, an open question of fair use, right? Is training on words of fair use because you don't have a monopoly on words, but some certain arrangements of words you do.And who is to say how much is memorization by a model versus actually learning and internalizing and then. Sometimes happening to land at the right, the [00:24:00] same result.Jonathan: And if I've learned one lesson, I'm not gonna be the person to answer that question. Right, exactly. And so my position is, you know, we will try to make this stuff open and available.Yeah. And, you know, let the community make decisions about what they are or aren't comfortable using. Um, and at the end of the day, you know, it still strikes me as a little bit weird that someone is trying to use these open source licenses to, you know, to close the ecosystem and not to make things more open.That's very much against the ethos of why these licenses were created.Swyx: So the official mosaic position, I guess is like, before you use TC MPC 7B for anything commercial, check your own lawyers now trust our lawyers, not mosaic's lawyers.Jonathan: Yeah, okay. Yeah. I'm, you know, our lawyers are not your lawyers.Exactly. And, you know, make the best decision for yourself. We've tried to be respectful of the content creators and, you know, at the end of the day, This is complicated. And this is something that is a new law. It's a new law. It's a new law that hasn't been established yet. Um, but it's a place where we're gonna continue to try to do the right thing.Um, and it's, I think, one of the commenters, you know, I really appreciated this said, you know, well, they're trying to do the right thing, but nobody knows what the right thing is to even do, you know, the, I guess the, the most right thing would've been to literally not release a model at all. But I don't think that would've been the best thing for the community either.Swyx: Cool.Well, thanks. Well handled. Uh, we had to cover it, just causeJonathan:  Oh, yes, no worries. A big piece of news. It's been on my mind a lot.TRAINING STABILITY ENHANCEMENT [00:25:15]Swyx: Yeah. Yeah. Well, you've been very thoughtful about it. Okay. So a lot of these other ideas in terms of architecture, flash, attention, alibi, and the other data sets were contributions from the rest of the let's just call it open community of, of machine learning advancements. Uh, but Mosaic in [00:25:30] particular had some stability improvements to mitigate loss spikes, quote unquote, uh, which, uh, I, I took to mean, uh, your existing set of tools, uh, maybe we just co kind of covered that. I don't wanna sort of put words in your mouth, but when you say things like, uh, please enjoy my empty logbook.How much of an oversell is that? How much, you know, how much is that marketing versus how much is that reality?Abhinav: Oh yeah. That, that one's real. Yeah. It's like fully end-to-end. Um, and I think.Swyx: So maybe like what, what specific features of Mosaic malibu?Abhinav: Totally, totally. Yeah. I think I'll break it into two parts.One is like training stability, right? Knowing that your model's gonna basically get to the end of the training without loss spikes. Um, and I think, you know, at the 7B scale, you know, for some models like it ha it's not that big of a deal. As you train for longer and longer durations, we found that it's trickier and trickier to avoid these lost spikes.And so we actually spent a long time figuring out, you know, what can we do about our initialization, about our optimizers, about the architecture that basically prevents these lost spikes. And you know, even in our training run, if you zoom in, you'll see small intermittent spikes, but they recover within a few hundred steps.And so that's kind of the magical bit. Our line is one of defenses we recover from Las Vegas, like just naturally, right? Mm-hmm. Our line two defense was that we used determinism and basically really smart resumption strategies so that if something catastrophic happened, we can resume very quickly, like a few batches before.And apply some of these like, uh, interventions. So we had these kinds of preparations, like a plan B, but we didn't have to use them at all for MPT 7B training. So, that was kind of like a lucky break. And the third part of like basically getting all the way to the empty law book is having the right training infrastructure.[00:27:00]So this is basically what, like is, one of the big selling points of the platform is that when you try to train these models on hundreds of GPUs, not many people outside, you know, like deep industry research owners, but the GPUs fail like a lot. Um, I would say like almost once every thousand a 100 days.So for us on like a big 512 cluster every two days, basically the run will fail. Um, and this is either due to GPUs, like falling off the bus, like that's, that's a real error we see, or kind of networking failures or something like that. And so in those situations, what people have normally done is they'll have an on-call team that's just sitting round the clock, 24-7 on slack, once something goes wrong.And if then they'll basically like to try to inspect the cluster, take nodes out that are broken, restart it, and it's a huge pain. Like we ourselves did this for a few months. And as a result of that, because we're building such a platform, we basically step by step automated every single one of those processes.So now when a run fails, we have this automatic kind of watch talk that's watching. It'll basically stop the job. Test the nodes cord in anyone's that are broken and relaunch it. And because our software's all deterministic has fast resumption stuff, it just continues on gracefully. So within that log you can see sometimes I think maybe at like 2:00 AM or something, the run failed and within a few minutes it's back up and running and all of us are just sleeping peacefully.Jonathan: I do wanna say that was hard one. Mm-hmm. Um, certainly this is not how things were going, you know, many months ago, hardware failures we had on calls who were, you know, getting up at two in the morning to, you know, figure out which node had died for what reason, restart the job, have to cord the node. [00:28:30] Um, we were seeing catastrophic loss spikes really frequently, even at the 7B scale that we're just completely derailing runs.And so this was step by step just ratcheting our way there. As Abhi said, to the point where, Many models are training at the moment and I'm sitting here in the studio and not worrying one bit about whether the runs are gonna continue. Yeah. Swyx: I'm, I'm not so much of a data center hardware kind of guy, but isn't there existing software to do this for CPUs and like, what's different about this domain? Does this question make sense at all?Jonathan: Yeah, so when I think about, like, I think back to all the Google fault tolerance papers I read, you know, as an undergrad or grad student mm-hmm. About, you know, building distributed systems. A lot of it is that, you know, Each CPU is doing, say, an individual unit of work.You've got a database that's distributed across your cluster. You wanna make sure that one CPU failing can't, or one machine failing can't, you know, delete data. So you, you replicate it. You know, you have protocols like Paxos where you're literally, you've got state machines that are replicated with, you know, with leaders and backups and things like that.And in this case, you were performing one giant computation where you cannot afford to lose any node. If you lose a node, you lose model state. If you lose a node, you can't continue. It may be that, that in the future we actually, you know, create new versions of a lot of our distributed training libraries that do have backups and where data is replicated so that if you lose a node, you can detect what node you've lost and just continue training without having to stop the run, you know?Pull from a checkpoint. Yeah. Restart again on different hardware. But for now, we're certainly in a world where if anything dies, that's the end of the run and you have to go back and recover from it. [00:30:00]DATA READINESS & TRAINING PREPARATION [00:30:00]Abhinav: Yeah. Like I think a big part, a big word there is like synchronous data pluralism, right? So like, we're basically saying that on every step, every GP is gonna do some work.They're gonna stay in sync with each other and average their, their gradients and continue. Now that there are algorithmic techniques to get around this, like you could say, oh, if a GP dies, just forget about it. All the data that's gonna see, we'll just forget about it. We're not gonna train on it.But, we don't like to do that currently because, um, it makes us give up determinism, stuff like that. Maybe in the future, as you go to extreme scales, we'll start looking at some of those methods. But at the current time it's like, we want determinism. We wanted to have a run that we could perfectly replicate if we needed to.And it was, the goal is figure out how to run it on a big cluster without humans having to babysit it. Babysit it. Alessio: So as you mentioned, these models are kind of the starting point for a lot of your customers To start, you have a. Inference product. You have a training product. You previously had a composer product that is now kind of not rolled into, but you have like a super set of it, which is like the LLM foundry.How are you seeing that change, you know, like from the usual LOP stack and like how people train things before versus now they're starting from, you know, one of these MPT models and coming from there. Like worship teams think about as they come to you and start their journey.Jonathan: So I think there's a key distinction to make here, which is, you know, when you say starting from MPT models, you can mean two things.One is actually starting from one of our checkpoints, which I think very few of our customers are actually going to do, and one is starting from our configuration. You can look at our friends at Rep for that, where, you know, MPT was in progress when Refl [00:31:30] came to us and said, Hey, we need a 3 billion parameter model by next week on all of our data.We're like, well, here you go. This is what we're doing, and if it's good enough for us, um, hopefully it's good enough for you. And that's basically the message we wanna send to our customers. MPT is basically clearing a path all the way through where they know that they can come bring their data, they can use our training infrastructure, they can use all of our amazing orchestration and other tools that abhi just mentioned, for fault tolerance.They can use Composer, which is, you know, still at the heart of our stack. And then the l l M Foundry is really the specific model configuration. They can come in and they know that thing is gonna train well because we've already done it multiple times. Swyx: Let's dig in a little bit more on what should people have ready before they come talk to you? So data architecture, eval that they're looking, etc.Abhinav: Yeah, I, I mean, I think we'll accept customers at any kind of stage in their pipeline. You know, like I'd say science, there's archetypes of people who have built products around like some of these API companies and reach a stage or maturity level where it's like we want our own custom models now, either for the purpose of reducing cost, right?Like our inference services. Quite a bit cheaper than using APIs or because they want some kind of customization that you can't really get from the other API providers. I'd say the most important things to have before training a big model. You know, you wanna have good eval metrics, you know, some kind of score that you can track as you're training your models and scaling up, they can tell you you're progressing.And it's really funny, like a lot of times customers will be really excited about training the models, right? It's really fun to like launch shelves on hundreds of gfs, just all around. It's super fun. But then they'll be like, but wait, what are we gonna measure? Not just the training loss, right? I mean, it's gotta be more than that.[00:33:00]So eval metrics is like a, it's a good pre-req also, you know, your data, you know, either coming with your own pre-training or fine-tune data and having like a strategy to clean it or we can help clean it too. I think we're, we're building a lot of tooling around that. And I think once you have those two kinds of inputs and sort of the budget that you want, we can pretty much walk you through the rest of it, right?Like that's kind of what we do. Recently we helped build CR FM's model for biomedical language a while back. Jonathan: Um, we can. That's the center of research for foundation models. Abhi: Exactly, exactly.Jonathan: Spelling it out for people. Of course.Abhinav: No, absolutely. Yeah, yeah. No, you've done more of these than I have.Um, I think, uh, basically it's sort of, we can help you figure out what model I should train to scale up so that when I go for my big run company, your here run, it's, uh, it's predictable. You can feel confident that it's gonna work, and you'll kind of know what quality you're gonna get out before you have to spend like a few hundred thousand dollars.DYNAMIC REAL-TIME MODEL EVALUATION [00:34:00]Alessio: The rap Reza from rap was on the podcast last week and, uh, they had human eval and then that, uh, I'm Jon Eval, which is like vibe based. Jonathan: And I, I do think the vibe based eval cannot be, you know, underrated really at the, I mean, at the end of the day we, we did stop our models and do vibe checks and we did, as we monitor our models, one of our evals was we just had a bunch of prompts and we would watch the answers as the model trained and see if they changed cuz honestly, You know, I don't really believe in any of these eval metrics to capture what we care about.Mm-hmm. But when you ask it, uh, you know, I don't know. I think one of our prompts was to suggest games for a three-year-old and a seven-year-old. That would be fun to play. Like that was a lot more [00:34:30] valuable to me personally, to see how that answer evolved and changed over the course of training. So, you know, and human eval, just to clarify for folks, human human eval is an automated evaluation metric.There's no humans in it at all. There's no humans in it at all. It's really badly named. I got so confused the first time that someone brought that to me and I was like, no, we're not bringing humans in. It's like, no, it's, it's automated. They just called it a bad name and there's only a hundred cents on it or something.Abhinav: Yeah. Yeah. And, and it's for code specifically, right?Jonathan: Yeah. Yeah. It's very weird. It's a, it's a weird, confusing name that I hate, but you know, when other metrics are called hella swag, like, you know, you do it, just gotta roll with it at this point. Swyx: You're doing live evals now. So one, one of the tweets that I saw from you was that it is, uh, important that you do it paralyzed.Uh, maybe you kind of wanna explain, uh, what, what you guys did.Abhinav: Yeah, for sure. So with LLM Foundry, there's many pieces to it. There's obviously the core training piece, but there's also, you know, tools for evaluation of models. And we've kind of had one of the, I think it's like the, the fastest like evaluation framework.Um, basically it's multi GPU compatible. It runs with Composer, it can support really, really big models. So basically our framework runs so fast that even Azure models are training. We can run these metrics live during the training. So like if you have a dashboard like weights and biases, you kind of watch all these evil metrics.We have, like, 15 or 20 of them honestly, that we track during the run and add negligible overhead. So we can actually watch as our models go and feel confident. Like, it's not like we wait until the very last day to, to test if the models good or notJonathan: That's amazing. Yeah. I love that we've gotten this far into the conversation.We still haven't talked about efficiency and speed. Those are usually our two watch words at Mosaic, which is, you know, that's great. That says that we're [00:36:00] doing a lot of other cool stuff, but at the end of the day, um, you know, Cost comes first. If you can't afford it, it doesn't matter. And so, you know, getting things down cheap enough that, you know, we can monitor in real time, getting things down cheap enough that we can even do it in the first place.That's the basis for everything we do.OPEN SCIENCE FOR AFFORDABLE AI RESEARCH [00:36:00]Alessio: Do you think a lot of the questions that we have around, you know, what data sets we should use and things like that are just because training was so expensive before that, we just haven't run enough experiments to figure that out. And is that one of your goals is trying to make it cheaper so that we can actually get the answers?Jonathan: Yeah, that's a big part of my personal conviction for being here. I think I'm, I'm still in my heart, the second year grad student who was jealous of all his friends who had GPUs and he didn't, and I couldn't train any models except in my laptop. And that, I mean, the lottery ticket experiments began on my laptop that I had to beg for one K 80 so that I could run amist.And I'm still that person deep down in my heart. And I'm a believer that, you know, if we wanna do science and really understand these systems and understand how to make them work well, understand how they behave, understand what makes them safe and reliable. We need to make it cheap enough that we can actually do science, and science involves running dozens of experiments.When I finally, you know, cleaned out my g c s bucket from my PhD, I deleted a million model checkpoints. I'm not kidding. There were over a million model checkpoints. That is the kind of science we need, you know, that's just what it takes. In the same way that if you're in a biology lab, you don't just grow one cell and say like, eh, the drug seems to work on that cell.Like, there's a lot more science you have to do before you really know.Abhinav: Yeah. And I think one of the special things about Mosaic's kind of [00:37:30] position as well is that we have such, so many customers all trying to train models that basically we have the incentive to like to devote all these resources and time to do this science.Because when we learn which pieces actually work, which ones don't, we get to help many, many people, right? And so that kind of aggregation process I think is really important for us. I remember way back there was a paper about Google that basically would investigate batch sizes or something like that.And it was this paper that must have cost a few million dollars during all the experience. And it was just like, wow, what a, what a benefit to the whole community. Now, like now we all get to learn from that and we get, we get to save. We don't have to spend those millions of dollars anymore. So I think, um, kind of mosaical science, like the insights we get on, on data, on pre-screening architecture, on all these different things, um, that's why customers come to us.Swyx: Yeah, you guys did some really good stuff on PubMed, G B T as well. That's the first time I heard of you. Of you. And that's also published to the community.Abhinav: Yeah, that one was really fun. We were like, well, no one's really trained, like fully from scratch domain specific models before. Like, what if we just did a biomed one?Would it still work? And, uh, yeah, I'd be really excited. That did, um, we'll probably have some follow up soon, I think, later this summer.Jonathan: Yeah. Yes. Stay tuned on that. Um, but I, I will say just in general, it's a really important value for us to be open in some sense. We have no incentive not to be open. You know, we make our money off of helping people train better.There's no cost to us in sharing what we learn with the community. Cuz really at the end of the day, we make our money off of those custom models and great infrastructure and, and putting all the pieces together. That's honestly where the Mosaic name came from. Not off of like, oh, we've got, you know, this one cool secret trick [00:39:00] that we won't tell you, or, you know, closing up.I sometimes, you know, in the past couple weeks I've talked to my friends at places like Brain or, you know, what used to be Brain Now Google DeepMind. Oh, I R I P Brain. Yeah. R i p Brian. I spent a lot of time there and it was really a formative time for me. Um, so I miss it, but. You know, I kind of feel like we're one of the biggest open research labs left in industry, which is a very sad state of affairs because we're not very big.Um, but at least can you say how big the team is actually? Yeah. We were about 15 researchers, so we're, we're tiny compared to, you know, the huge army of researchers I remember at Brain or at fair, at Deep Mind back, you know, when I was there during their heydays. Um, you know, but everybody else is kind of, you know, closed up and isn't saying very much anymore.Yeah. And we're gonna keep talking and we're gonna keep sharing and, you know, we will try to be that vanguard to the best of our ability. We're very small and I, I can't promise we're gonna do what those labs used to do in terms of scale or quantity of research, but we will share what we learn and we will try to create resources for the community.Um, I, I dunno, I just, I believe in openness fundamentally. I'm an academic at heart and it's sad to me to watch that go away from a lot of the big labs. THE OPEN APPROACH [00:40:15]Alessio: We just had a live pod about the, you know, open AI snow mode, uh, post that came out and it was one of the first time I really dove into Laura and some of the this new technologies, like how are you thinking about what it's gonna take for like the open approach to really work?Obviously today, GPT four is still, you know, part of like that state-of-the-art model for a [00:40:30] lot of tasks. Do you think some of the innovation and kind of returning methods that we have today are enough if enough people like you guys are like running these, these research groups that are open? Or do you think we still need a step function improvement there?Jonathan: I think one important point here is the idea of coexistence. I think when you look at, I don't know who won Linux or Windows, the answer is yes. Microsoft bought GitHub and has a Windows subsystem for Linux. Linux runs a huge number of our servers and Microsoft is still a wildly profitable company.Probably the most successful tech company right now. So who won open source or closed source? Yes. Um, and I think that's a similar world that we're gonna be in here where, you know, it's gonna be different things for different purposes. I would not run Linux on my laptop personally cuz I like connecting to wifi and printing things.But I wouldn't run Windows on one of my surfers. And so I do think what we're seeing with a lot of our customers is, do they choose opening IR mosaic? Yes. There's a purpose for each of these. You have to send your data off to somebody else with open eyes models. That's a risk. GPT four is amazing and I would never promise someone that if they come to Mosaic, they're gonna get a GPT four quality model.That's way beyond our means and not what we're trying to do anyway. But there's also a whole world for, you know, domain specific models, context specific models that are really specialized, proprietary, trained on your own data that can do things that you could never do with one of these big models. You can customize in crazy ways like G B T four is not gonna hit 65 K context length for a very long time, cuz they've already trained that [00:42:00] model and you know, they haven't even released the 32 K version yet.So we can, you know, we can do things differently, you know, by being flexible. So I think the answer to all this is yes. But we can't see the open source ecosystem disappear. And that's the scariest thing for me. I hear a lot of talk in academia about, you know, whatever happened to that academic research on this field called information retrieval?Well, in 1999 it disappeared. Why? Because Google came along and who cares about information retrieval research when you know you have a Google Scale, you know, Web Scale database. So you know, there's a balance here. We need to have both. Swyx: I wanna applaud you, Elaine. We'll maybe edit it a little like crowd applause, uh, line.Cuz I, I think that, um, that is something that as a research community, as people interested in progress, we need to see these things instead of just, uh, seeing marketing papers from the advertising GPT 4.Jonathan: Yeah. I, I think I, you know, to get on my soapbox for 10 more seconds. Go ahead. When I talk to policymakers about, you know, the AI ecosystem, the usual fear that I bring up is, Innovation will slow because of lack of openness.I've been complaining about this for years and it's finally happened. Hmm. Why is Google sharing, you know, these papers? Why is Open AI sharing these papers? There are a lot of reasons. You know, I have my own beliefs, but it's not something we should take for granted that everybody's sharing the work that they do and it turns out well, I think we took it for granted for a while and now it's gone.I think it's gonna slow down the pace of progress. In a lot of cases, each of these labs has a bit of a monoculture and being able to pass ideas [00:43:30] back and forth was a lot of what kept, you know, scientific progress moving. So it's imperative not just, you know, for the open source community and for academia, but for the progress of technology.That we have a vibrant open source research community.THE FUTURE OF MOSAIC [00:44:11]Swyx: There's a preview of the ecosystem and commentary that we're, we're gonna do. But I wanna close out some stuff on Mosaic. You launched a bunch of stuff this month. A lot of stuff, uh, actually was, I was listening to you on Gradient descent, uh, and other podcasts we know and love.Uh, and you said you also said you were not gonna do inference and, and, and last week you were like, here's Mosaic ML inference. Oops. So maybe just a, at a high level, what was Mosaic ml and like, what is it growing into? Like how do you conceptualize this? Jonathan: Yeah, and I will say gradient, when graded dissent was recorded, we weren't doing inference and had no plans to do it.It took a little while for the podcast to get out. Um, in the meantime, basically, you know, one thing I've learned at a startup, and I'm sure abhi can comment on this as well, focus is the most important thing. We have done our best work when we've been focused on doing one thing really well and our worst work when we've tried to do lots of things.Yeah. So, We don't want to do inference, we don't want to have had to do inference. Um, and at the end of the day, our customers were begging us to do it because they wanted a good way to serve the models and they liked our ecosystem. And so in some sense, we got dragged into it kicking and screaming. We're very excited to have a product.We're going to put our best foot forward and make something really truly amazing. But there is, you know, that's something that we were reluctant to do. You know, our customers convinced us it would be good for our business. It's been wonderful for business and we are gonna put everything into this, but you know, back when grading dissent came out, I [00:45:00] was thinking like, or when we recorded it or focused, oh God, like focus is the most important thing.I've learned that the hard way multiple times that Mosaic, abhi can tell you like, you know, I've made a lot of mistakes on not focusing enough. Um, boy inference, that's a whole second thing, and a whole different animal from training. And at the end of the day, when we founded the company, our belief was that inference was relatively well served at that time.There were a lot of great inference companies out there. Um, training was not well served, especially efficient training. And we had something to add there. I think we've discovered that as the nature of the models have changed, the nature of what we had to add to inference changed a lot and there became an opportunity for us to contribute something.But that was not the plan. But now we do wanna be the place that people come when they wanna train these big, complex, difficult models and know that it's gonna go right the first time and they're gonna have something they can servee right away. Um, you know, really the rep example of, you know, with 10 days to go saying, Hey, can you please train that model?And, you know, three or four days later the model was trained and we were just having fun doing interesting, fine tuning work in it for the rest of the 10 days, you know. That also requires good inference. Swyx: That's true, that's true. Like, so running evals and, and fine tuning. I'm just putting my business hat on and you know, and Alessio as well, like, uh, I've actually had fights with potential co-founders about this on the primary business.Almost like being training, right? Like essentially a one-time cost.Jonathan: Who told you it was a one time cost? What, who, who told you that?Swyx: No, no, no, no. Correct me. Jonathan: Yeah. Yeah. Let me correct you in two ways. Um, as our CEO Navine would say, if he were here, when you create version 1.0 of your software, do you then fire all the engineers?Of [00:46:30] course not. You never, like, MPT has a thousand different things we wanted to do that we never got to. So, you know, there will be future models.Abhinav: And, and the data that's been trained on is also changing over time too, right? If you wanna ask anything about, I guess like May of 2023, we'll have to retrain it further and so on.Right? And I think this is especially true for customers who run like the kind of things that need to be up to date on world knowledge. So I, I think like, you know, the other thing I would say too is that, The malls we have today are certainly not the best malls we'll ever produce. Right. They're gonna get smaller, they're gonna get faster, they're gonna get cheaper, they're gonna get lower latency, they're gonna get higher quality.Right? And so you always want the next gen version of MPT and the one after that and one after that. There's a reason that even the GPT series goes three, four, and we know there's gonna be a five. Right? Um, so I I I also don't see as a, as a one-time cost.Jonathan: Yeah. Yeah. And I, if you wanna cite a stat on this, there are very, very

The Nonlinear Library
LW - Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1 by StefanHex

The Nonlinear Library

Play Episode Listen Later May 9, 2023 17:19


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1, published by StefanHex on May 9, 2023 on LessWrong. We solved the first Mechanistic Interpretability challenge that Stephen Casper posed in EIS VII. We spent the last Alignment Jam hackathon attempting to solve the two challenges presented there, and present our (confirmed) solution to the CNN challenge here. We will present a write-up of our work on the Transformer challenge in a future post.Stefan and Marius submitted an early version of this work at the end of the Hackathon, and Stefan added Intervention and Causal Scrubbing tests to the final write-up. A notebook reproducing all results is provided here (requires no GPU but ~13 GB RAM). The challenges each provide a pre-trained network, and the task is to reverse engineer the network as well as to infer the labeling function used for training. The first challenge network is a MNIST CNN that takes MNIST images and outputs labels. The hints given are that [1] the labels are binary, [2] the test set accuracy is 95.58%, [3] that the (secret) labeling function is simple, and [4] this image: The MNIST network consists of 2 Convolutional layers (Conv -> ReLU -> Dropout -> Pool)x2 2 Fully connected layers (fc1[400,200] -> ReLU -> fc2[200,2]) and we can access the data (torchvision.datasets.MNIST) but not the ground truth labels. Spoilers ahead! Summary of our solution (TL,DR) The inputs are labelled based on similarity with a 1 versus similarity with an inverted 1 ("anti-1"). If the difference is large (either clearly 1 or clearly anti-1) the image is labeled as class 0, otherwise the image is labeled as class 0. Specifically, the template for 1 seems to be is the given hint (clue_image), and the "anti-1" is 1-clue_image. The similarity is measured as the sum over the element-wise product of the image matrices (or equivalently dot product of flattened image arrays).Then the the ~17k most similar images to “1” and ~14k most similar images to “anti 1” are labelled class 1, the remaining ~29k images are labelled class 0. We can also phrase this as a band filter for similarity with (clue_image - 0.5), defining class 0 as where -17305 < (image (clue_image - 0.5)).sum() < -7762.We can observe this internally by looking at the embedding in the 200-neurons space, a PCA decomposition colored by label shows how the model judges this similarity: The model internally implements this via two groups of feature detectors (in the 200-dimensional neuron layer). These are “1-detectors” (detecting clue_image) and “anti-1-detectors” (detecting 1-clue_image). If either group fires sufficiently strongly, the model is classified as class 1, otherwise class 0. This classification has 96.2% overlap with model labels. Since the model itself has only 95.6% accuracy on the test set we think the difference is plausibly due to model error. We test our hypothesis with Causal Scrubbing. Specifically we test our hypothesis that of the 200 neurons there are 48 neurons detecting "1" similarity, 31 neurons detecting "anti-1" similarity, and 121 dead (useless) neurons. We resample-ablate all neurons by replacing each neurons activation with its activation on a different dataset example where our hypothesis predicts it to have a similar activation. Part 1: How we found this solution Since the model prediction depends on the logit (final neuron) difference we can directly say that in the final layer we only care about the logit of label 1 minus the logit of label 0, the logit diff direction. Then, using the [200,2] fc2 weight matrix (biases irrelevant) we can directly translate this logit diff direction into the 200-dim neuron space (by taking the difference of the respective weight vectors). We will make use of the logit diff directions at multiple points throughout the analysis. PCA ...

The Nonlinear Library
AF - Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1 by Stefan Heimersheim

The Nonlinear Library

Play Episode Listen Later May 9, 2023 17:20


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1, published by Stefan Heimersheim on May 9, 2023 on The AI Alignment Forum. We solved the first Mechanistic Interpretability challenge that Stephen Casper posed in EIS VII. We spent the last Alignment Jam hackathon attempting to solve the two challenges presented there, and present our (confirmed) solution to the CNN challenge here. We will present a write-up of our work on the Transformer challenge in a future post.Stefan and Marius submitted an early version of this work at the end of the Hackathon, and Stefan added Intervention and Causal Scrubbing tests to the final write-up. A notebook reproducing all results is provided here (requires no GPU but ~13 GB RAM). The challenges each provide a pre-trained network, and the task is to reverse engineer the network as well as to infer the labeling function used for training. The first challenge network is a MNIST CNN that takes MNIST images and outputs labels. The hints given are that [1] the labels are binary, [2] the test set accuracy is 95.58%, [3] that the (secret) labeling function is simple, and [4] this image: The MNIST network consists of 2 Convolutional layers (Conv -> ReLU -> Dropout -> Pool)x2 2 Fully connected layers (fc1[400,200] -> ReLU -> fc2[200,2]) and we can access the data (torchvision.datasets.MNIST) but not the ground truth labels. Spoilers ahead! Summary of our solution (TL,DR) The inputs are labelled based on similarity with a 1 versus similarity with an inverted 1 ("anti-1"). If the difference is large (either clearly 1 or clearly anti-1) the image is labeled as class 0, otherwise the image is labeled as class 0. Specifically, the template for 1 seems to be is the given hint (clue_image), and the "anti-1" is 1-clue_image. The similarity is measured as the sum over the element-wise product of the image matrices (or equivalently dot product of flattened image arrays).Then the the ~17k most similar images to “1” and ~14k most similar images to “anti 1” are labelled class 1, the remaining ~29k images are labelled class 0. We can also phrase this as a band filter for similarity with (clue_image - 0.5), defining class 0 as where -17305 < (image (clue_image - 0.5)).sum() < -7762.We can observe this internally by looking at the embedding in the 200-neurons space, a PCA decomposition colored by label shows how the model judges this similarity: The model internally implements this via two groups of feature detectors (in the 200-dimensional neuron layer). These are “1-detectors” (detecting clue_image) and “anti-1-detectors” (detecting 1-clue_image). If either group fires sufficiently strongly, the model is classified as class 1, otherwise class 0. This classification has 96.2% overlap with model labels. Since the model itself has only 95.6% accuracy on the test set we think the difference is plausibly due to model error. We test our hypothesis with Causal Scrubbing. Specifically we test our hypothesis that of the 200 neurons there are 48 neurons detecting "1" similarity, 31 neurons detecting "anti-1" similarity, and 121 dead (useless) neurons. We resample-ablate all neurons by replacing each neurons activation with its activation on a different dataset example where our hypothesis predicts it to have a similar activation. Part 1: How we found this solution Since the model prediction depends on the logit (final neuron) difference we can directly say that in the final layer we only care about the logit of label 1 minus the logit of label 0, the logit diff direction. Then, using the [200,2] fc2 weight matrix (biases irrelevant) we can directly translate this logit diff direction into the 200-dim neuron space (by taking the difference of the respective weight vectors). We will make use of the logit diff directions at multiple points through...

The Nonlinear Library: LessWrong
LW - Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1 by StefanHex

The Nonlinear Library: LessWrong

Play Episode Listen Later May 9, 2023 17:19


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1, published by StefanHex on May 9, 2023 on LessWrong. We solved the first Mechanistic Interpretability challenge that Stephen Casper posed in EIS VII. We spent the last Alignment Jam hackathon attempting to solve the two challenges presented there, and present our (confirmed) solution to the CNN challenge here. We will present a write-up of our work on the Transformer challenge in a future post.Stefan and Marius submitted an early version of this work at the end of the Hackathon, and Stefan added Intervention and Causal Scrubbing tests to the final write-up. A notebook reproducing all results is provided here (requires no GPU but ~13 GB RAM). The challenges each provide a pre-trained network, and the task is to reverse engineer the network as well as to infer the labeling function used for training. The first challenge network is a MNIST CNN that takes MNIST images and outputs labels. The hints given are that [1] the labels are binary, [2] the test set accuracy is 95.58%, [3] that the (secret) labeling function is simple, and [4] this image: The MNIST network consists of 2 Convolutional layers (Conv -> ReLU -> Dropout -> Pool)x2 2 Fully connected layers (fc1[400,200] -> ReLU -> fc2[200,2]) and we can access the data (torchvision.datasets.MNIST) but not the ground truth labels. Spoilers ahead! Summary of our solution (TL,DR) The inputs are labelled based on similarity with a 1 versus similarity with an inverted 1 ("anti-1"). If the difference is large (either clearly 1 or clearly anti-1) the image is labeled as class 0, otherwise the image is labeled as class 0. Specifically, the template for 1 seems to be is the given hint (clue_image), and the "anti-1" is 1-clue_image. The similarity is measured as the sum over the element-wise product of the image matrices (or equivalently dot product of flattened image arrays).Then the the ~17k most similar images to “1” and ~14k most similar images to “anti 1” are labelled class 1, the remaining ~29k images are labelled class 0. We can also phrase this as a band filter for similarity with (clue_image - 0.5), defining class 0 as where -17305 < (image (clue_image - 0.5)).sum() < -7762.We can observe this internally by looking at the embedding in the 200-neurons space, a PCA decomposition colored by label shows how the model judges this similarity: The model internally implements this via two groups of feature detectors (in the 200-dimensional neuron layer). These are “1-detectors” (detecting clue_image) and “anti-1-detectors” (detecting 1-clue_image). If either group fires sufficiently strongly, the model is classified as class 1, otherwise class 0. This classification has 96.2% overlap with model labels. Since the model itself has only 95.6% accuracy on the test set we think the difference is plausibly due to model error. We test our hypothesis with Causal Scrubbing. Specifically we test our hypothesis that of the 200 neurons there are 48 neurons detecting "1" similarity, 31 neurons detecting "anti-1" similarity, and 121 dead (useless) neurons. We resample-ablate all neurons by replacing each neurons activation with its activation on a different dataset example where our hypothesis predicts it to have a similar activation. Part 1: How we found this solution Since the model prediction depends on the logit (final neuron) difference we can directly say that in the final layer we only care about the logit of label 1 minus the logit of label 0, the logit diff direction. Then, using the [200,2] fc2 weight matrix (biases irrelevant) we can directly translate this logit diff direction into the 200-dim neuron space (by taking the difference of the respective weight vectors). We will make use of the logit diff directions at multiple points throughout the analysis. PCA ...

PaperPlayer biorxiv neuroscience
Comparison of Anatomical and Diffusion MRI for detecting Parkinson&apos;s Disease using Deep Convolutional Neural Network

PaperPlayer biorxiv neuroscience

Play Episode Listen Later May 1, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.05.01.538952v1?rss=1 Authors: Chattopadhyay, T., Singh, A., Laltoo, E., Boyle, C. P., Owens-Walton, C., Chen, Y.-L., Cook, P., McMillan, C., Tsai, C.-C., Wang, J.-J., Wu, Y.-R., van der Werf, Y., Thompson, P. M. Abstract: Parkinson's disease (PD) is a progressive neurodegenerative disease that affects over 10 million people worldwide. Brain atrophy and microstructural abnormalities tend to be more subtle in PD than in other age-related conditions such as Alzheimer's disease, so there is interest in how well machine learning methods can detect PD in radiological scans. Deep learning models based on convolutional neural networks (CNNs) can automatically distil diagnostically useful features from raw MRI scans, but most CNN-based deep learning models have only been tested on T1-weighted brain MRI. Here we examine the added value of diffusion-weighted MRI (dMRI) - a variant of MRI, sensitive to microstructural tissue properties - as an additional input in CNN-based models for PD classification. Our evaluations used data from 3 separate cohorts - from Chang Gung University, the University of Pennsylvania, and the PPMI dataset. We trained CNNs on various combinations of these cohorts to find the best predictive model. Although tests on more diverse data are warranted, deep-learned models from dMRI show promise for PD classification. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
Competitive performance and superior noise robustness of a non-negative deep convolutional spiking network

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Apr 24, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.22.537923v1?rss=1 Authors: Rotermund, D., Garcia-Ortiz, A., Pawelzik, K. R. Abstract: Networks of spiking neurons promise to combine energy efficiency with high performance. However, spiking models that match the performance of current state-of-the-art networks while requiring moderate computational resources are still lacking. Here we present an alternative framework to deep convolutional networks (CNNs), the "Spike by Spike" network (SbS), together with an efficient backpropagation algorithm. SbS implements networks based on non-negative matrix factorisation (NNMF), but uses discrete events as signals instead of real values. On clean data, the performance of CNNs is matched by both NNMF-based networks and SbS. SbS are found to be most robust when the data is corrupted by noise, specially when this noise was not seen before. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
Human visual cortex and deep convolutional neural network care deeply about object background

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Apr 14, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.04.14.536853v1?rss=1 Authors: Loke, J., Seijdel, N., Snoek, L., Sörensen, L. K. A., van de Klundert, R., van der Meer, M., Quispel, E., Cappaert, N., Scholte, H. S. Abstract: Deep convolutional neural networks (DCNNs) are able to predict brain activity during object categorization tasks, but factors contributing to this predictive power are not fully understood. Our study aimed to investigate the factors contributing to the predictive power of DCNNs in object categorization tasks. We compared the activity of four DCNN architectures with electroencephalography (EEG) recordings obtained from 62 human subjects during an object categorization task. Previous physiological studies on object categorization have highlighted the importance of figure-ground segregation - the ability to distinguish objects from their backgrounds. Therefore, we set out to investigate if figureground segregation could explain DCNNs predictive power. Using a stimuli set consisting of identical target objects embedded in different backgrounds, we examined the influence of object background versus object category on both EEG and DCNN activity. Crucially, the recombination of naturalistic objects and experimentally-controlled backgrounds creates a sufficiently challenging and naturalistic task, while allowing us to retain experimental control. Our results showed that early EEG activity ( less than 100ms) and early DCNN layers represent object background rather than object category. We also found that the predictive power of DCNNs on EEG activity is related to processing of object backgrounds, rather than categories. We provided evidence from both trained and untrained (i.e. random weights) DCNNs, showing figure-ground segregation to be a crucial step prior to the learning of object features. These findings suggest that both human visual cortex and DCNNs rely on the segregation of object backgrounds and target objects in order to perform object categorization. Altogether, our study provides new insights into the mechanisms underlying object categorization as we demonstrated that both human visual cortex and DCNNs care deeply about object background. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
Estimating receptive fields of simple and complex cells in early visual cortex: A convolutional neural network model with parameterized rectification

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Mar 31, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.30.534278v1?rss=1 Authors: Nguyen, P., Sooriyaarachchi, J., Huang, Q., Baker, C. L. Abstract: Neurons in the primary visual cortex respond selectively to simple features of visual stimuli, such as orientation and spatial frequency. Simple cells, which have phase-sensitive responses, can be modeled by a single receptive field filter in a linear-nonlinear model. However, it is challenging to analyze phase-invariant complex cells, which require more elaborate models having a combination of nonlinear subunits. Estimating parameters of these models is made more difficult by cortical neurons' trial-to-trial response variability. We develop a simple convolutional neural network method to estimate receptive field models for both simple and complex visual cortex cells from their responses to natural images. The model consists of a spatiotemporal filter, a parameterized rectifier unit (PReLU), and a two-dimensional Gaussian "map" of the receptive field envelope. A single model parameter determines the simple vs. complex nature of the receptive field, capturing complex cell responses as a summation of homogeneous subunits, and collapsing to a linear-nonlinear model for simple type cells. The convolutional method predicts simple and complex cell responses to natural image stimuli as well as grating tuning curves. The model estimates yield a continuum of values for the PReLU parameter across the sampled neurons, showing that the simple/complex nature of cells can vary in a continuous manner. We demonstrate that complex cells respond less reliably than simple cells - compensation for this unreliability reveals good predictive performance on novel sets of natural images, with predictive performance for complex cells proportionately closer to that for simple cells. Most spatial receptive field structures are well fit by Gabor functions, whose parameters confirm well-known properties of cat A17/18 receptive fields. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
Robust neural tracking of linguistic speech representations using a convolutional neural network.

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Mar 31, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.03.30.534911v1?rss=1 Authors: Puffay, C., Vanthornhout, J., Gillis, M., Accou, B., Van hamme, H., Francart, T. Abstract: Objective: When listening to continuous speech, populations of neurons in the brain track different features of the signal. Neural tracking can be measured by relating the electroencephalography (EEG) and the speech signal. Recent studies have shown a significant contribution of linguistic features over acoustic neural tracking using linear models. Linear models cannot model the nonlinear dynamics of the brain. We introduce a convolutional neural network (CNN) that relates EEG to linguistic features using phoneme or word onsets as a control, and has the capacity to model non-linear relations. Approach: We integrate phoneme- and word-based linguistic features (phoneme surprisal, cohort entropy, word surprisal and word frequency) in our nonlinear CNN model and investigate if they carry additional information on top of lexical features (phoneme and word onsets). We compare the results to a linear decoder and a linear CNN, and evaluate the impact of the model's architecture, the presence of linguistic features and the training paradigm on a match-mismatch task performance. Main results: For the non-linear CNN, we found a significant contribution of cohort entropy over phoneme onsets, and of word surprisal and word frequency over word onsets. The training paradigm and architecture have a significant impact on the performance, and the non-linear CNN outperforms the linear baselines on the match-mismatch task. Significance: Measuring coding of linguistic features in the brain is important for auditory neuroscience research and applications that involve objectively measuring speech understanding. With linear models this is measurable, but the effects are very small. The proposed non-linear CNN model yields larger effect sizes and therefore could show effects that would be otherwise unmeasurable, and may in the future lead to improved within-subject measures and shorter recording durations. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
A Convolutional Autoencoder-based Explainable Clustering Approach for Resting-State EEG Analysis

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Jan 5, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.04.522805v1?rss=1 Authors: Ellis, C. A., Miller, R., Calhoun, V. Abstract: Machine learning methods have frequently been applied to electroencephalography (EEG) data. However, while supervised EEG classification is well-developed, relatively few studies have clustered EEG, which is problematic given the potential for clustering EEG to identify novel subtypes or patterns of dynamics that could improve our understanding of neuropsychiatric disorders. There are established methods for clustering EEG using manually extracted features that reduce the richness of the feature space for clustering, but only a couple studies have sought to use deep learning-based approaches with automated feature learning to cluster EEG. Those studies involve separately training an autoencoder and then performing clustering on the extracted features, and the separation of those steps can lead to poor quality clustering. In this study, we propose an explainable convolutional autoencoder-based approach that combines model training with clustering to yield high quality clusters. We apply the approach within the context of schizophrenia (SZ), identifying 8 EEG states characterized by varying levels of {delta} activity. We also find that individuals who spend more time outside of the dominant state tend to have increased negative symptom severity. Our approach represents a significant step forward for clustering resting-state EEG data and has the potential to lead to novel findings across a variety of neurological and neuropsychological disorders in future years. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
Facial representation comparisons between human brain and deep convolutional neural network reveal a fatigue repetition suppression mechanism

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Jan 3, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.02.522298v1?rss=1 Authors: Lu, Z., Ku, Y. Abstract: Repetition suppression for faces, a phenomenon that neural responses are reduced to repeated faces in the visual cortex, have long been studied. However, the underlying primary neural mechanism of repetition suppression remains debated. In recent years, artificial neural networks can achieve the performance of face recognition at human level. In our current study, we combined human electroencephalogram (EEG) and the deep convolutional neural network (DCNN) and applied reverse engineering to provide a novel way to investigate the neural mechanisms of facial repetition suppression. First, we used brain decoding approach to explore the representations of faces and demonstrates its repetition suppression effect in human brains. Then we constructed two repetition suppression models, Fatigue and Sharpening models, to modify the activation of DCNNs and conducted cross-modal representational similarity analysis (RSA) comparisons between human EEG signals and activations in two modified DCNNs, respectively. We found that representations of human brains were more similar to representations of Fatigue-modified DCNN instead of Sharpening modified DCNN. Our results suggests that the facial repetition suppression effect in face perception is more likely caused by the fatigue mechanism suggesting that the activation of neurons with stronger responses to face stimulus would be attenuated more. Therefore, the current study supports the fatigue mechanism as a more plausible neural mechanism of facial repetition suppression. The comparison between representations in the human brain and DCNN provides a promising tool to simulate and infer the brain mechanism underlying human behaviors. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
Neural correlates of face perception modeled with a convolutional recurrent neural network

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Jan 3, 2023


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.02.522523v1?rss=1 Authors: O'Reilly, J. A., Wehrman, J., Carey, A., Bedwin, J., Hourn, T., Asadi, F., Sowman, P. F. Abstract: Event-related potential (ERP) sensitivity to faces is predominantly characterized by an N170 peak that has greater amplitude and shorter latency when elicited by human faces than images of other objects. We developed a computational model of visual ERP generation to study this phenomenon which consisted of a convolutional neural network (CNN) connected to a recurrent neural network (RNN). We used open-access data to develop the model, generated synthetic images for simulating experiments, then collected additional data to validate predictions of these simulations. For modeling, visual stimuli presented during ERP experiments were represented as sequences of images (time x pixels). These were provided as inputs to the model. The CNN transformed these inputs into sequences of vectors that were passed to the RNN. The ERP waveforms evoked by visual stimuli were provided to the RNN as labels for supervised learning. The whole model was trained end-to-end using data from the open-access dataset to reproduce ERP waveforms evoked by visual events. Cross-validation model outputs strongly correlated with open-access (r = 0.98) and validation study data (r = 0.78). Open-access and validation study data correlated similarly (r = 0.81). Some aspects of model behavior were consistent with neural recordings while others were not, suggesting promising albeit limited capacity for modeling the neurophysiology of face-sensitive ERP generation. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

PaperPlayer biorxiv neuroscience
Get a new perspective on EEG: Convolutional neural network encoders for parametric t-SNE.

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Dec 9, 2022


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.12.08.519691v1?rss=1 Authors: Svantesson, M., Olausson, H., Eklund, A., Thordstein, M. Abstract: Background: t-distributed stochastic neighbor embedding (t-SNE) is a method for reducing high-dimensional data to a low-dimensional representation and is mostly used for visualizing data. In parametric t-SNE, a neural network learns to reproduce this mapping. When used for EEG analysis, the data is usually first transformed into a set of features, but it is not known which features are optimal. New method: The principle of t-SNE was used to train convolutional neural network (CNN) encoders to learn to produce both a high- and a low-dimensional representation, eliminating the need for feature engineering. A simple neighbor distribution based on ranked distances was used for the high- dimensional representation instead of the traditional normal distribution. To evaluate the method, the Temple University EEG Corpus was used to create three datasets with distinct EEG characters: 1) wakefulness and sleep, 2) interictal epileptiform discharges, and 3) seizure activity. Results: The CNN encoders for the three datasets produced low-dimensional representations of the da- tasets with a global and local structure that conformed well to the EEG characters and general- ized to new data. Comparison to existing methods: Compared to parametric t-SNE for either a short-time Fourier transforms or wavelet represen- tation of the datasets, the developed CNN encoders performed equally well but generally pro- duced a higher degree of clustering. Conclusions: The developed principle is promising and could be further developed to create general tools for exploring relations in EEG data, e.g., visual summaries of recordings, and trends for continuous EEG monitoring. It might also be used to generate features for other types of machine learning. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

Astro arXiv | all categories
When Spectral Modeling Meets Convolutional Networks: A Method for Discovering Reionization-era Lensed Quasars in Multi-band Imaging Data

Astro arXiv | all categories

Play Episode Listen Later Nov 29, 2022 0:54


When Spectral Modeling Meets Convolutional Networks: A Method for Discovering Reionization-era Lensed Quasars in Multi-band Imaging Data by Irham Taufik Andika et al. on Tuesday 29 November Over the last two decades, around three hundred quasars have been discovered at $zgtrsim6$, yet only one was identified as being strong-gravitationally lensed. We explore a new approach, enlarging the permitted spectral parameter space while introducing a new spatial geometry veto criterion, implemented via image-based deep learning. We made the first application of this approach in a systematic search for reionization-era lensed quasars, using data from the Dark Energy Survey, the Visible and Infrared Survey Telescope for Astronomy Hemisphere Survey, and the Wide-field Infrared Survey Explorer. Our search method consists of two main parts: (i) pre-selection of the candidates based on their spectral energy distributions (SEDs) using catalog-level photometry and (ii) relative probabilities calculation of being a lens or some contaminant utilizing a convolutional neural network (CNN) classification. The training datasets are constructed by painting deflected point-source lights over actual galaxy images to generate realistic galaxy-quasar lens models, optimized to find systems with small image separations, i.e., Einstein radii of $theta_mathrm{E} leq 1$ arcsec. Visual inspection is then performed for sources with CNN scores of $P_mathrm{lens} > 0.1$, which led us to obtain 36 newly-selected lens candidates, waiting for spectroscopic confirmation. These findings show that automated SED modeling and deep learning pipelines, supported by modest human input, are a promising route for detecting strong lenses from large catalogs that can overcome the veto limitations of primarily dropout-based SED selection approaches. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2211.14543v1

Astro arXiv | all categories
When Spectral Modeling Meets Convolutional Networks: A Method for Discovering Reionization-era Lensed Quasars in Multi-band Imaging Data

Astro arXiv | all categories

Play Episode Listen Later Nov 28, 2022 0:54


When Spectral Modeling Meets Convolutional Networks: A Method for Discovering Reionization-era Lensed Quasars in Multi-band Imaging Data by Irham Taufik Andika et al. on Monday 28 November Over the last two decades, around three hundred quasars have been discovered at $zgtrsim6$, yet only one was identified as being strong-gravitationally lensed. We explore a new approach, enlarging the permitted spectral parameter space while introducing a new spatial geometry veto criterion, implemented via image-based deep learning. We made the first application of this approach in a systematic search for reionization-era lensed quasars, using data from the Dark Energy Survey, the Visible and Infrared Survey Telescope for Astronomy Hemisphere Survey, and the Wide-field Infrared Survey Explorer. Our search method consists of two main parts: (i) pre-selection of the candidates based on their spectral energy distributions (SEDs) using catalog-level photometry and (ii) relative probabilities calculation of being a lens or some contaminant utilizing a convolutional neural network (CNN) classification. The training datasets are constructed by painting deflected point-source lights over actual galaxy images to generate realistic galaxy-quasar lens models, optimized to find systems with small image separations, i.e., Einstein radii of $theta_mathrm{E} leq 1$ arcsec. Visual inspection is then performed for sources with CNN scores of $P_mathrm{lens} > 0.1$, which led us to obtain 36 newly-selected lens candidates, waiting for spectroscopic confirmation. These findings show that automated SED modeling and deep learning pipelines, supported by modest human input, are a promising route for detecting strong lenses from large catalogs that can overcome the veto limitations of primarily dropout-based SED selection approaches. arXiv: http://arxiv.org/abs/http://arxiv.org/abs/2211.14543v1

PaperPlayer biorxiv neuroscience
Unraveling Spatial-Spectral Dynamics of Speech Categorization Speed using Convolutional Neural Network

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Nov 22, 2022


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.11.21.517434v1?rss=1 Authors: Moinuddin, K. A., Havugimana, F., Al-Fahad, R., Bidelman, G., Yeasin, M. Abstract: The process of categorizing sounds into distinct phonetic categories is known as categorical perception (CP). Response times (RTs) provide a measure of perceptual difficulty during labeling decisions (i.e., categorization). The RT is quasi-stochastic in nature due to individuality and variations in perceptual tasks. To identify the source of RT variation in CP, we have built models to decode the brain regions and frequency bands driving fast, medium, and slow response decision speeds. In particular, we implemented parameter optimized convolutional neural network (CNN) to classify listeners' behavioral RTs from their neural EEG data. We adopted visual interpretation of model response using Guided-GradCAM to identify spatial-spectral correlates of RT. Our framework includes (but is not limited to): (i) a data augmentation technique designed to reduce noise and control the overall variance of EEG dataset; (ii) bandpower topomaps to learn the spatial-spectral representation using CNN; (iii) large-scale Bayesian hyper-parameter optimization to find best performing CNN model; (iv) ANOVA and post-hoc analysis on Guided-GradCAM activation values to measure the effect of neural regions and frequency bands on behavioral responses. Using this framework, we observe that - {beta} (10-20 Hz) activity over left frontal, right prefrontal/frontal, and right cerebellar regions are correlated with RT variation. Our results indicate that attention, template matching, temporal prediction of acoustics, motor control, and decision uncertainty are the most probable factors in RT variation. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC

Papers Read on AI
What Makes Convolutional Models Great on Long Sequence Modeling?

Papers Read on AI

Play Episode Listen Later Nov 2, 2022 28:07


Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution , making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information based on the pair-wise attention score but also makes the computational complexity quadratic to the sequence length. S4 can be efficiently implemented as a global convolutional model whose kernel size equals the input sequence length. With Fast Fourier Transform, S4 can model much longer sequences than Transformers and achieve significant gains over SoTA on several long-range tasks. Despite its empirical success, S4 is involved. It requires sophis-ticated parameterization and initialization schemes that combine the wisdom from several prior works. As a result, S4 is less intuitive and hard to use for researchers with limited prior knowledge. Here we aim to demystify S4 and extract basic principles that contribute to the success of S4 as a global convolutional model. 2022: Yuhong Li, Tianle Cai, Yi Zhang, De-huai Chen, Debadeepta Dey https://arxiv.org/pdf/2210.09298v1.pdf

The Machine Learning Podcast
Building A Business Powered By Machine Learning At Assembly AI

The Machine Learning Podcast

Play Episode Listen Later Sep 9, 2022 58:42


Summary The increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product. Announcements Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery. Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out! Your host is Tobias Macey and today I’m interviewing Dylan Fox about building and growing a business with ML as its core offering Interview Introduction How did you get involved in machine learning? Can you describe what Assembly is and the story behind it? For anyone who isn’t familiar with your platform, can you describe the role that ML/AI plays in your product? What was your process for going from idea to prototype for an AI powered business? Can you offer parallels between your own experience and that of your peers who are building businesses oriented more toward pure software applications? How are you structuring your teams? On the path to your current scale and capabilities how have you managed scoping of your model capabilities and operational scale to avoid getting bogged down or burnt out? How do you think about scoping of model functionality to balance composability and system complexity? What is your process for identifying and understanding which problems are suited to ML and when to rely on pure software? You are constantly iterating on model performance and introducing new capabilities. How do you manage prototyping and experimentation cycles? What are the metrics that you track to identify whether and when to move from an experimental to an operational state with a model? What is your process for understanding what’s possible and what can feasibly operate at scale? Can you describe your overall operational patterns delivery process for ML? What are some of the most useful investments in tooling that you have made to manage development experience for your teams? Once you have a model in operation, how do you manage performance tuning? (from both a model and an operational scalability perspective) What are the most interesting, innovative, or unexpected aspects of ML development and maintenance that you have encountered while building and growing the Assembly platform? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Assembly? When is ML the wrong choice? What do you have planned for the future of Assembly? Contact Info @YouveGotFox on Twitter LinkedIn Parting Question From your perspective, what is the biggest barrier to adoption of machine learning today? Closing Announcements Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Links Assembly AI Podcast.__init__ Episode Learn Python the Hard Way NLTK NLP == Natural Language Processing NLU == Natural Language Understanding Speech Recognition Tensorflow r/machinelearning SciPy PyTorch Jax HuggingFace RNN == Recurrent Neural Network CNN == Convolutional Neural Network LSTM == Long Short Term Memory Hidden Markov Models Baidu DeepSpeech CTC (Connectionist Temporal Classification) Loss Model Twilio Grid Search K80 GPU A100 GPU TPU == Tensor Processing Unit Foundation Models BLOOM Language Model DALL-E 2 The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Papers Read on AI
Lenia and Expanded Universe

Papers Read on AI

Play Episode Listen Later Oct 12, 2021 30:45


We report experimental extensions of Lenia, a continuous cellular automata family capable of producing lifelike self-organizing autonomous patterns. The rule of Lenia was generalized into higher dimensions, multiple kernels, and multiple channels. The final architecture approaches what can be seen as a recurrent convolutional neural network. Using semi-automatic search e.g. genetic algorithm, we discovered new phenomena like polyhedral symmetries, individuality, self-replication, emission, growth by ingestion, and saw the emergence of "virtual eukaryotes" that possess internal division of labor and type differentiation. We discuss the results in the contexts of biology, artificial life, and artificial intelligence. 2020: B. Chan Genetic algorithm, Artificial intelligence, Convolutional neural network, Artificial life, Self-organization, Cellular automaton, Self-replication, Artificial neural network, Polyhedron, Organizing (structure), Autonomous robot, Emergence, Recurrent neural network, Semiconductor industry https://arxiv.org/pdf/2005.03742v1.pdf

Hacker Public Radio
HPR3319: Linux Inlaws S01E28: Politicians and artificial intelligence part 1

Hacker Public Radio

Play Episode Listen Later Apr 22, 2021


In this episode, our two heroes explore the realm of artificial intelligence, paying special attention to deep learning (hoping that some of the stuff may rub on them :-). In this first part of a three-part mini-series the chaps discuss the foundation including networks, neurons and other topics of advanced black magic, carefully avoiding the temptations of introducing too much maths (we'll leave this to the Grumpy Old Coders :-). Links: Artificial intelligence: https://en.wikipedia.org/wiki/Artificial_intelligence Machine learning: https://www.mygreatlearning.com/blog/machine-learning-tutorial Deep learning: https://www.guru99.com/deep-learning-tutorial.html Artificial neural networks (ANN): https://www.asimovinstitute.org/neural-network-zoo Back-propagation ANNs (BPN): https://en.wikipedia.org/wiki/Backpropagation DWAVE: https://www.dwavesys.com/quantum-computing Convolutional neural networks (CNNs): https://en.wikipedia.org/wiki/Convolutional_neural_network Generative adversarial network (GAN): https://en.wikipedia.org/wiki/Generative_adversarial_network Spy vs. Spy: http://toonopedia.com/spyvsspy.htm Atlantik Ale: https://www.stoertebeker.com/stoertebeker-atlantik-ale.html

Dave Lee on Investing
Tesla's FSD Hardware Advantage w/ James Douma #9 (Ep. 274)

Dave Lee on Investing

Play Episode Listen Later Mar 18, 2021 51:44


I'm joined by James Douma as we discuss Tesla's approach to FSD hardware and why it's built for success. View part 2 of this conversation, https://youtu.be/CFtBdWN3t_Y . James Douma on Twitter: https://twitter.com/jamesdouma James Douma Playlist, https://www.youtube.com/watch?v=iMtujONU_0I&list=PLfibpgBinf9R7KIedEU3y-YjrA63LSKHX Timestamps 00:00 - Introduction 0:39 - Does Tesla FSD have a hardware advantage?* 6:58 - Tesla's made the best chip for FSD 8:20 - Energy consumption of FSD chip 12:15 - 2300 frames per second 17:40 - What will the next generation chip be used for? 21:23 - Will the next generation chip be a drop in replacement? 22:17 - How will the next generation chip be rolled out? 28:55 - Convolutional neurons 36:15 - Back propagation and self fixing Neural Nets 41:20 - Human inputs 45:12 - What resolution are the images? 48:22 - What percent of processing capability is needed to run FSD? 50:15 - Conclusion Social

Dave Lee on Investing
Tesla’s FSD Hardware Advantage Part 1 w/ James Douma (Ep. 274)

Dave Lee on Investing

Play Episode Listen Later Mar 18, 2021 51:44


I'm joined by James Douma as we discuss Tesla's approach to FSD hardware and why it's built for success. James Douma on Twitter: https://twitter.com/jamesdouma James Douma Playlist, https://www.youtube.com/watch?v=iMtujONU_0I&list=PLfibpgBinf9R7KIedEU3y-YjrA63LSKHX Timestamps 00:00 - Introduction 0:39 - Does Tesla FSD have a hardware advantage?* 6:58 - Tesla's made the best chip for FSD 8:20 - Energy consumption of FSD chip 12:15 - 2300 frames per second 17:40 - What will the next generation chip be used for? 21:23 - Will the next generation chip be a drop in replacement? 22:17 - How will the next generation chip be rolled out? 28:55 - Convolutional neurons 36:15 - Back propagation and self fixing Neural Nets 41:20 - Human inputs 45:12 - What resolution are the images? 48:22 - What percent of processing capability is needed to run FSD? 50:15 - Conclusion Social

Dave Lee on Investing
Tesla’s FSD Hardware Advantage Part 1 w/ James Douma (Ep. 274)

Dave Lee on Investing

Play Episode Listen Later Mar 18, 2021 51:44


I'm joined by James Douma as we discuss Tesla's approach to FSD hardware and why it's built for success. View part 2 of this conversation, https://youtu.be/CFtBdWN3t_Y . James Douma on Twitter: https://twitter.com/jamesdouma James Douma Playlist, https://www.youtube.com/watch?v=iMtujONU_0I&list=PLfibpgBinf9R7KIedEU3y-YjrA63LSKHX Timestamps 00:00 - Introduction 0:39 - Does Tesla FSD have a hardware advantage?* 6:58 - Tesla's made the best chip for FSD 8:20 - Energy consumption of FSD chip 12:15 - 2300 frames per second 17:40 - What will the next generation chip be used for? 21:23 - Will the next generation chip be a drop in replacement? 22:17 - How will the next generation chip be rolled out? 28:55 - Convolutional neurons 36:15 - Back propagation and self fixing Neural Nets 41:20 - Human inputs 45:12 - What resolution are the images? 48:22 - What percent of processing capability is needed to run FSD? 50:15 - Conclusion Social

Focal Point: the IMV imaging podcast
Artificial Intelligence in Veterinary Practice

Focal Point: the IMV imaging podcast

Play Episode Listen Later Feb 27, 2021 37:13


Links https://www.vetport.com/artificial-intelligence-in-veterinary-medicine https://www.veterinarypracticenews.com/ai-diagnostics-january-2020/ https://www.vetpartners.org/how-can-ai-improve-veterinary-medicine/ https://www.veteldiagnostics.com/artificial-intelligence-now-required-practice-good-medicine https://www.aavr.org/index.php?option=com_content&view=article&id=99&Itemid=263 https://ivcjournal.com/veterinary-radiology-ai/ https://vetology.ai/ https://www.vetmed.ucdavis.edu/news/veterinarians-use-artificial-intelligence-aid-diagnosis-addisons-disease https://cedar.ucdavis.edu/ https://www.signalpet.com/products/signalray/ Convolutional neural network - Wikipedia How We Do It – Atomwise https://stanfordmlgroup.github.io/competitions/chexpert/ https://www.kaggle.com/nih-chest-xrays/data https://mimic-cxr.mit.edu/ https://physionet.org/content/mimic-cxr/2.0.0/ https://www.nature.com/articles/s41597-019-0322-0 https://github.com/MIT-LCP/mimic-cxr https://qure.ai/ https://pytorch.org/hub/pytorch_vision_googlenet/ https://pytorch.org/hub/pytorch_vision_resnet/ https://www.nlm.nih.gov/healthit/snomedct/index.html https://en.wikipedia.org/wiki/Artificial_intelligence https://medium.com/fintechexplained/neural-networks-activation-function-to-back-propagation-understanding-neural-networks-bdd036c3f29f https://medium.com/fintechexplained/neural-networks-bias-and-weights-10b53e6285da Contact Usemail: clinical@imv-imaging.comwebsite: www.imv-imaging.co.uk

PaperPlayer biorxiv bioinformatics
CAMP: a Convolutional Attention-based Neural Network for Multifaceted Peptide-protein Interaction Prediction

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Nov 16, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.16.384784v1?rss=1 Authors: Lei, Y., Li, S., Liu, Z., Wan, F., Tian, T., Li, S., Zhao, D., Zeng, J. Abstract: Peptide-protein interactions (PepPIs) are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. To facilitate the peptide drug discovery process, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Although several deep-learning-based frameworks have been proposed to predict compound-protein interactions or protein-protein interactions, few of them are particularly designed to specifically predict peptide-protein interactions. In this paper, We present a sequence-based Convolutional Attention-based neural network for Multifaceted prediction of Peptide-protein interactions, called CAMP, including predicting binary peptide-protein interactions and corresponding binding residues in the peptides. We also construct a benchmark dataset containing high-quality peptide-protein interaction pairs with the corresponding peptide binding residues for model training and evaluation. CAMP incorporates convolution neural network architectures and attention mechanism to fully exploit informative sequence-based features, including secondary structures, physicochemical properties, intrinsic disorder features and position-specific scoring matrix of the protein. Systematical evaluation of our benchmark dataset demonstrates that CAMP outperforms the state-of-the-art baseline methods on binary peptide-protein interaction prediction. In addition, CAMP can successfully identify the binding residues involved non-covalent interactions for peptides. These results indicate that CAMP can serve as a useful tool in peptide-protein interaction prediction and peptide binding site identification, which can thus greatly facilitate the peptide drug discovery process. The source code of CAMP can be found in https://github.com/twopin/CAMP. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
Optimising a Simple Fully Convolutional Network (SFCN) for accurate brain age prediction in the PAC 2019 challenge

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Nov 11, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.10.376970v1?rss=1 Authors: Gong, W., Beckmann, C. F., Vedaldi, A., Smith, S. M., Peng, H. Abstract: Brain age prediction from brain MRI scans not only helps improve brain ageing modelling generally, but also provides benchmarks for predictive analysis methods. Brain-age delta, which is the difference between a subject's predicted age and true age, has become a meaningful biomarker for the health of the brain. Here, we report the details of our brain age prediction models and results in the Predictive Analysis Challenge 2019. The aim of the challenge was to use T1-weighted brain MRIs to predict a subject's age in multicentre datasets. We apply a lightweight deep convolutional neural network architecture, Simple Fully Convolutional Neural Network (SFCN), and combined several techniques including data augmentation, transfer learning, model ensemble, and bias correction for brain age prediction. The model achieved first places in both of the two objectives in the PAC 2019 brain age prediction challenge: Mean absolute error (MAE) = 2.90 years without bias removal, and MAE = 2.95 years with bias removal. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
Predicting drug resistance in M. tuberculosis using a Long-term Recurrent Convolutional Networks architecture

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Nov 8, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.07.372136v1?rss=1 Authors: Safari, A. H., Sedaghat, N., Forna, A., Zabeti, H., Chindelevitch, L., Libbrecht, M. Abstract: Drug resistance in Mycobacterium tuberculosis (MTB) may soon be a leading worldwide cause of death. One way to mitigate the risk of drug resistance is through methods that predict drug resistance in MTB using whole-genome sequencing (WGS) data. Existing machine learning methods for this task featurize the WGS data from a given bacterial isolate by defining one input feature per SNP. Here, we introduce a gene-centric method for predicting drug resistance in TB. We define one feature per gene according to the number of mutations in that gene in a given isolate. This representation greatly decreases the number of model parameters. We further propose a model that considers both gene order through a Long-term Recurrent Convolutional Network (LRCN) architecture, which combines convolutional and recurrent layers. We find that using these strategies yields a substantial, statistically-significant improvement over the state-of-the-art and that this improvement is driven by the order of genes in the genome and their organization into operons. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
A Convolutional Network Architecture Driven by Mouse Neuroanatomical Data

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Oct 25, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.23.353151v1?rss=1 Authors: Shi, J., Buice, M. A., Shea-Brown, E., Mihalas, S., Tripp, B. P. Abstract: Convolutional neural networks trained on object recognition derive some inspiration from the neuroscience of the visual system in primates, and have been used as models of the feedforward computation performed in the primate ventral stream. In contrast to the hierarchical organization of primates, the visual system of the mouse has flatter hierarchy. Since mice are capable of visually guided behavior, this raises questions about the role of architecture in neural computation. In this work, we introduce a framework for building a biologically constrained convolutional neural network model of lateral areas of the mouse visual cortex. The structural parameters of the network are derived from experimental measurements, specifically estimates of numbers of neurons in each area and cortical layer, the interareal connectome, and the statistics of connections between cortical layers. This network is constructed to support detailed task-optimized models of mouse visual cortex, with neural populations that can be compared to specific corresponding populations in the mouse brain. The code is freely available to support such research. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
High precision in microRNA prediction: a novel genome-wide approach based on convolutional deep residual networks

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Oct 25, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.23.352179v1?rss=1 Authors: Yones, C. A., Raad Macchiaroli, J., Bugnon, L., Milone, D. H., Stegmayer, G. Abstract: Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays, and the precise prediction of novel candidates with computational methods is still very needed. This could be done by searching homologous with sequence alignment tools, but this will be restricted only to sequences very similar to the known miRNA precursors (pre-miRNAs). Furthermore, other important properties of pre-miRNAs, such as the secondary structure, are not taken into account by these methods. Many machine learning approaches were proposed in the last years to fill this gap, but these methods were tested in very controlled conditions, which are not fulfilled, for example, when predicting in newly sequenced genomes, where no miRNAs are known. If these methods are used under real conditions, the precision achieved is far from the one published. Results: This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network. The proposed model has been tested on several complete genomes of animals and plants, achieving a precision up to 5 times higher than other approaches at the same recall rates. Also, a novel validation methodology is used to ensure that the performance reported can be achieved when using the method on new unknown species. Availability: To provide fast an easy access to mirDNN, a web demo is available in http://sinc.unl.edu.ar/web-demo/mirdnn/. It can process fasta files with multiple sequences to calculate the prediction scores, and can generate the nucleotide importance plots. The full source code of this project is available at http://sourceforge.net/projects/sourcesinc/files/mirdnn Contact: cyones@sinc.unl.edu.ar Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
Multidimensional face representation in deep convolutional neural network reveals the mechanism underlying AI racism

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Oct 21, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.20.347898v1?rss=1 Authors: Tian, J., Xie, H., Hu, S., Liu, J. Abstract: The increasingly popular application of AI runs the risks of amplifying social bias, such as classifying non-white faces to animals. Recent research has attributed the bias largely to data for training. However, the underlying mechanism is little known, and therefore strategies to rectify the bias are unresolved. Here we examined a typical deep convolutional neural network (DCNN), VGG-Face, which was trained with a face dataset consisting of more white faces than black and Asian faces. The transfer learning result showed significantly better performance in identifying white faces, just like the well-known social bias in human, the other-race effect (ORE). To test whether the effect resulted from the imbalance of face images, we retrained the VGG-Face with a dataset containing more Asian faces, and found a reverse ORE that the newly-trained VGG-Face preferred Asian faces over white faces in identification accuracy. In addition, when the number of Asian faces and white faces were matched in the dataset, the DCNN did not show any bias. To further examine how imbalanced image input led to the ORE, we performed the representational similarity analysis on VGG-Face's activation. We found that when the dataset contained more white faces, the representation of white faces was more distinct, indexed by smaller ingroup similarity and larger representational Euclidean distance. That is, white faces were scattered more sparsely in the representational face space of the VGG-Face than the other faces. Importantly, the distinctiveness of faces was positively correlated with the identification accuracy, which explained the ORE observed in the VGG-Face. In sum, our study revealed the mechanism underlying the ORE in DCNNs, which provides a novel approach of study AI ethics. In addition, the face multidimensional representation theory discovered in human was found also applicable to DCNNs, advocating future studies to apply more cognitive theories to understand DCNN's behavior. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
Epileptic Spike Detection by Using a Linear-Phase Convolutional Neural Network

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Oct 9, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.08.330936v1?rss=1 Authors: Fukumori, K., Yoshida, N., Sugano, H., Nakajima, M., Tanaka, T. Abstract: To cope with the lack of highly skilled professionals, machine leaning with proper signal techniques is a key to establishing automated diagnostic-aid technologies to conduct epileptic electroencephalogram (EEG) testing. In particular, frequency filtering with appropriate passbands is essential to enhance biomarkers[-]such as epileptic spike waves[-]that are noted in the EEG. This paper introduces a novel class of convolutional neural networks (CNNs) having a bank of linear-phase finite impulse response filters at the first layer. These may behave as bandpass filters that extract biomarkers without destroying waveforms because of linear-phase condition. The proposed CNNs were trained with a large amount of clinical EEG data, including 15,899 epileptic spike waveforms recorded from 50 patients. These have been labeled by specialists. Experimental results show that the trained data-driven filter bank with supervised learning is dyadic like discrete wavelet transform. Moreover, the area under the curve achieved above 0.9 in most cases. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Oct 7, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.10.05.326140v1?rss=1 Authors: Li, Y., Zhang, C., Bell, E. W., Zheng, W., Zhou, X., Yu, D.-J., Zhang, Y. Abstract: The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP and CAMEO experiments, and outperformed other state-of-the-art methods by at least 58.4% for the CASP 11&12 and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. Availability: The training and testing data, standalone package, and the online server for TripletRes are available at https://zhanglab.ccmb.med.umich.edu/TripletRes/. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Oct 1, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.29.318642v1?rss=1 Authors: Ke, Z., Vikalo, H. Abstract: Haplotype assembly and viral quasispecies reconstruction are challenging tasks concerned with analysis of genomic mixtures using sequencing data. High-throughput sequencing technologies generate enormous amounts of short fragments (reads) which essentially oversample components of a mixture; the representation redundancy enables reconstruction of the components (haplotypes, viral strains). The reconstruction problem, known to be NP-hard, boils down to grouping together reads originating from the same component in a mixture. Existing methods struggle to solve this problem with required level of accuracy and low runtimes; the problem is becoming increasingly more challenging as the number and length of the components increase. This paper proposes a read clustering method based on a convolutional auto-encoder designed to first project sequenced fragments to a low-dimensional space and then estimate the probability of the read origin using learned embedded features. The components are reconstructed by finding consensus sequences that agglomerate reads from the same origin. Mini-batch stochastic gradient descent and dimension reduction of reads allow the proposed method to efficiently deal with massive numbers of long reads. Experiments on simulated, semi-experimental and experimental data demonstrate the ability of the proposed method to accurately reconstruct haplotypes and viral quasispecies, often demonstrating superior performance compared to state-of-the-art methods. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
ACP-MHCNN: An Accurate Multi-Headed Deep-Convolutional Neural Network to Predict Anticancer peptides

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Sep 28, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.25.313668v1?rss=1 Authors: Ahmed, S., Muhammod, R., Adilina, S., Khan, Z. H., Shatabda, S., Dehzangi, A. Abstract: Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification through simultaneous interaction with different numerical peptide representations while restraining parameter overhead. It is evident through rigorous experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models for anticancer peptide identification by a substantial margin. ACP-MHCNN outperforms state-of-the-art model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and MCC respectively. ACP-MHCNN and its relevant codes and datasets are publicly available at: https://github.com/mrzResearchArena/Anticancer-Peptides-CNN. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
DeepInsight-FS: Selecting features for non-image data using convolutional neural network

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Sep 19, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.17.301515v1?rss=1 Authors: Sharma, A., Lysenko, A., Boroevich, K., Vans, E., Tsunoda, T. Abstract: Identifying smaller element or gene subsets from biological or other data types is an essential step in discovering underlying mechanisms. Statistical machine learning methods have played a key role in revealing gene subsets. However, growing data complexity is pushing the limits of these techniques. A review of the recent literature shows that arranging elements by similarity in image-form for a convolutional neural network (CNN) improves classification performance over treating them individually. Expanding on this, here we show a pipeline, DeepInsight-FS, to uncover gene subsets of clinical relevance. DeepInsight-FS converts non-image samples into image-form and performs element selection via CNN. To our knowledge, this is the first approach to employ CNN for element or gene selection on non-image data. A real world application of DeepInsight-FS to publicly available cancer data identified gene sets with significant overlap to several cancer-associated pathways suggesting the potential of this method to discover biomedically meaningful connections. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
scGCN: a Graph Convolutional Networks Algorithm for Knowledge Transfer in Single Cell Omics

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Sep 14, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.13.295535v1?rss=1 Authors: Song, Q., Su, J., Zhang, W. Abstract: Single-cell omics represent the fastest-growing genomics data type in the literature and the public genomics repositories. Leveraging the growing repository of labeled datasets and transferring labels from existing datasets to newly generated datasets will empower the exploration of the single-cell omics. The current label transfer methods have limited performance, largely due to the intrinsic heterogeneity and extrinsic differences between datasets. Here, we present a robust graph-based artificial intelligence model, single-cell Graph Convolutional Network (scGCN), to achieve effective knowledge transfer across disparate datasets. Benchmarked with other label transfer methods on totally 30 single cell omics datasets, scGCN has consistently demonstrated superior accuracy on leveraging cells from different tissues, platforms, and species, as well as cells profiled at different molecular layers. scGCN is implemented as an integrated workflow as a python software, which is available at https://github.com/QSong-github/scGCN. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
Modeling the hemodynamic response function using simultaneous EEG-fMRI data and convolutional sparse coding analysis with rank-1 constraints

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Sep 10, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.09.290296v1?rss=1 Authors: Prokopiou, P., Kassinopoulos, M., Xifra-Porxas, A., Boudrias, M.-H., Mitsis, G. D. Abstract: Over the last few years, an increasing body of evidence points to the hemodynamic response function as an important confound of resting-state functional connectivity. Several studies in the literature proposed using blind deconvolution of resting-state fMRI data to retrieve the HRF, which can be subsequently used for hemodynamic deblurring. A basic hypothesis in these studies is that relevant information of the resting-state brain dynamics is condensed in discrete events resulting in large amplitude peaks in the BOLD signal. In this work, we showed that important information of resting-state activity, in addition to the larger amplitude peaks, is also concentrated in lower amplitude peaks. Moreover, due to the strong effect of physiological noise and head motion on the BOLD signal, which in many cases may not be completely removed after preprocessing, the neurophysiological origin of the large amplitude BOLD signal peaks is questionable. Hence, focusing on the large amplitude BOLD signal peaks may yield biased HRF estimates. To define discrete events of neuronal origins, we proposed using simultaneous EEG-fMRI along with convolutional sparse coding analysis. Our results suggested that events detected in the EEG are able to describe the slow oscillations of the BOLD signal and to obtain consistent HRF shapes across subjects under both task-based and resting-state conditions. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
Accurately Clustering Single-cell RNA-seq data by Capturing Structural Relations between Cells through Graph Convolutional Network

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Sep 3, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.02.278804v1?rss=1 Authors: zeng, y., zhou, x., rao, j., lu, y., yang, y. Abstract: Recent advances in single-cell RNA sequencing (scRNA-seq) technologies provide a great opportunity to study gene expression at cellular resolution, and the scRNA-seq data has been routinely conducted to unfold cell heterogeneity and diversity. A critical step for the scRNA-seq analyses is to cluster the same type of cells, and many methods have been developed for cell clustering. However, existing clustering methods are limited to extract the representations from expression data of individual cells, while ignoring the high-order structural relations between cells. Here, we proposed a new method (GraphSCC) to cluster cells based on scRNA-seq data by accounting structural relations between cells through a graph convolutional network. The representation learned from the graph convolutional network, together with another representation output from a denoising autoencoder network, are optimized by a dual self-supervised module for better cell clustering. Extensive experiments indicate that GraphSCC model outperforms state-of-the-art methods in various evaluation metrics on both simulated and real datasets. Further visualizations show that GraphSCC provides representations for better intra-cluster compactness and inter-cluster separability. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
Three-dimensional convolutional autoencoder extracts features of structural brain images with a diagnostic label-free approach: Application to schizophrenia datasets

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Aug 25, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.24.213447v1?rss=1 Authors: Yamaguchi, H., Hashimoto, Y., Sugihara, G., Miyata, J., Murai, T., Takahashi, H., Honda, M., Hishimoto, A., Yamashita, Y. Abstract: There has been increasing interest in performing psychiatric brain imaging studies using deep learning. However, most studies in this field disregard three-dimensional (3D) spatial information and targeted disease discrimination, without considering the genetic and clinical heterogeneity of psychiatric disorders. The purpose of this study was to investigate the efficacy of a 3D convolutional autoencoder (CAE) for extracting features related to psychiatric disorders without diagnostic labels. The network was trained using a Kyoto University dataset including 82 patients with schizophrenia (SZ) and 90 healthy subjects (HS), and was evaluated using Center for Biomedical Research Excellence (COBRE) datasets including 71 SZ patients and 71 HS. The proposed 3D-CAEs were successfully reconstructed into high-resolution 3D structural magnetic resonance imaging (MRI) scans with sufficiently low errors. In addition, the features extracted using 3D-CAE retained the relevant clinical information. We explored the appropriate hyper parameter range of 3D-CAE, and it was suggested that a model with eight convolution layers might be relevant to extract features for predicting the dose of medication and symptom severity in schizophrenia. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
Learning Patterns of the Ageing Brain in MRI using Deep Convolutional Networks

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Aug 17, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.17.253732v1?rss=1 Authors: Dinsdale, N. K., Bluemke, E., Smith, S. M., Arya, Z., Vidaurre, D., Jenkinson, M., Namburete, A. I. L. Abstract: Both normal ageing and neurodegenerative diseases cause morphological changes to the brain. Age-related brain changes are subtle, nonlinear, and spatially and temporally heterogenous, both within a subject and across a population. Machine learning models are particularly suited to capture these patterns and can produce a model that is sensitive to changes of interest, despite the large variety in healthy brain appearance. In this paper, the power of convolutional neural networks (CNNs) and the rich UK Biobank dataset, the largest database currently available, are harnessed to address the problem of predicting brain age. We developed a 3D CNN architecture to predict chronological age, using a training dataset of 12,802 T1-weighted MRI images and a further 6,885 images for testing. The proposed method shows competitive performance on age prediction, but, most importantly, the CNN prediction errors {Delta}Brain Age = AgePredicted - AgeTrue correlated significantly with many clinical measurements from the UK Biobank in the female and male groups. In addition, having used images from only one imaging modality in this experiment, we examined the relationship between {Delta}Brain Age and the image-derived phenotypes (IDPs) from all other imaging modalities in the UK Biobank, showing correlations consistent with known patterns of ageing. Furthermore, we show that the use of nonlinearly registered images to train CNNs can lead to the network being driven by artefacts of the registration process and missing subtle indicators of ageing, limiting the clinical relevance. Due to the longitudinal aspect of the UK Biobank study, in the future it will be possible to explore whether the {Delta}Brain Age from models such as this network were predictive of any health outcomes. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv bioinformatics
VirPreNet: A weighted ensemble convolutional neural network for the virulence prediction of influenza A virus using all 8 segments

PaperPlayer biorxiv bioinformatics

Play Episode Listen Later Jul 31, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.31.230904v1?rss=1 Authors: Yin, R., Luo, Z., Zhuang, P., Lin, Z., Kwoh, C. K. Abstract: Motivation: Influenza viruses are persistently threatening public health, causing annual epidemics and sporadic pandemics. The evolution of influenza viruses remains to be the main obstacle in the effectiveness of antiviral treatments due to rapid mutations. Previous work has been investigated to reveal the determinants of virulence of the influenza A virus. To further facilitate flu surveillance, explicit detection of influenza virulence is crucial to protect public health from potential future pandemics. Results: In this paper, we propose a weighted ensemble convolutional neural network for the virulence prediction of influenza A viruses named VirPreNet that uses all 8 segments. Firstly, mouse lethal dose 50 is exerted to label the virulence of infections into two classes, namely avirulent and virulent. A numerical representation of amino acids named ProtVec is applied to the 8-segments in a distributed manner to encode the biological sequences. After splittings and embeddings of influenza strains, the ensemble convolutional neural network is constructed as the base model on the influenza dataset of each segment, which serves as the VirPreNet's main part. Followed by a linear layer, the initial predictive outcomes are integrated and assigned with different weights for the final prediction. The experimental results on the collected influenza dataset indicate that VirPreNet achieves state-of-the-art performance combining ProtVec with our proposed architecture. It outperforms baseline methods on the independent testing data. Moreover, our proposed model reveals the importance of PB2 and HA segments on the virulence prediction. We believe that our model may provide new insights into the investigation of influenza virulence. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
Synaptic dynamics as convolutional units

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Jun 5, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.06.04.133892v1?rss=1 Authors: Rossbroich, J., Trotter, D., Toth, K., Naud, R. Abstract: Synaptic dynamics differ markedly across connections and strongly regulate how action potentials are being communicated. To model the range of synaptic dynamics observed in experiments, we develop a flexible mathematical framework based on a linear-nonlinear operation. This model can capture various experimentally observed features of synaptic dynamics and different types of heteroskedasticity. Despite its conceptual simplicity, we show it is more adaptable than previous models. Combined with a standard maximum likelihood approach, synaptic dynamics can be accurately and efficiently characterized using naturalistic stimulation patterns. These results make explicit that synaptic processing bears algorithmic similarities with information processing in convolutional neural networks. Copy rights belong to original authors. Visit the link for more info

PaperPlayer biorxiv neuroscience
CoNNECT: Convolutional Neural Network for Estimating synaptic Connectivity from spike Trains

PaperPlayer biorxiv neuroscience

Play Episode Listen Later May 5, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.05.05.078089v1?rss=1 Authors: Endo, D., Kobayashi, R., Bartolo, R., Averbeck, B. B., Sugase-Miyamoto, Y., Hayashi, K., Kenji, K., Richmond, B. J., Shinomoto, S. Abstract: The recent increase in reliable, simultaneous high channel count extracellular recordings is exciting for physiologists and theoreticians, because it offers the possibility of reconstructing the underlying neuronal circuits. We recently presented a method of inferring this circuit connectivity from neuronal spike trains by applying the generalized linear model to cross-correlograms, GLMCC. Although the GLMCC algorithm can do a good job of circuit reconstruction, the parameters need to be carefully tuned for each individual dataset. Here we present another algorithm using a convolutional neural network for estimating synaptic connectivity from spike trains, CoNNECT. After adaptation to very large amounts of simulated data, this algorithm robustly captures the specific feature of monosynaptic impact in a noisy cross-correlogram. There are no user-adjustable parameters. With this new algorithm, we have constructed diagrams of neuronal circuits recorded in several cortical areas of monkeys. Copy rights belong to original authors. Visit the link for more info

Machine Learning in Healthcare, by Skin Analytics
#005: Causation & Correlation with Artificial Intelligence

Machine Learning in Healthcare, by Skin Analytics

Play Episode Listen Later Mar 16, 2020 15:49


This week, Neil is joined by Dr. Jack Greenhalgh, AI director at Skin Analytics to discuss causation and correlation. Featured this week: The difference between causation and correlation and their relevance when it comes to AI How death by drowning in swimming pools correlates with movies starring Nicholas Cage and why that's relevant when it comes to understanding machine learning The risk of overfitting and the best ways to prevent falling into traps Convolutional neural networks and Jan Le Cun’s tweets! And again we highlight why prospective studies are so important Visit us at: https://skin-analytics.com/ Get in touch with Neil: neil@skinanalytics.co.uk | Linkedin: www.linkedin.com/in/ndaly/

JACC Podcast
Detection of Hypertrophic Cardiomyopathy Using a Convolutional Neural Network-Enabled Electrocardiogram

JACC Podcast

Play Episode Listen Later Feb 18, 2020 9:50


Underserved
Ep. 012, Rise of the machines

Underserved

Play Episode Listen Later Nov 20, 2019 38:50


How many people started their data science career at 16 years old? Then took a sabbatical to play with famous jazz musicians? And subsequently bootstrapped the US presence for a Norwegian data science firm? I know one! His name is Russ Wilcox. On this week's Underserved Russel explains how a Tesla sees the world (and how it can be fooled), how to teach an old AI new tricks, and we discuss the ethical implications of artificial intelligence. If you ever wondered how machine learning worked, or if the rise of SKYNET is imminent, this is the podcast for you.   Russel's company, Sannsyn US Sannsyn.com/us   Gradient Descent: https://en.wikipedia.org/wiki/Gradient_descent  Convolutional neural networks: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks  Anders Hanssen: https://www.mn.uio.no/math/personer/vit/anderch/  Tensorflow: https://www.tensorflow.org/ Keras: https://keras.io/  Pytorch: https://pytorch.org/  Comparison of TF, Keras, PT: https://dzone.com/articles/tensorflow-vs-pytorch-vs-keras-for-nlp  Anthony Fung: https://www.anthonyfungmusic.com/  George Garzone: http://www.georgegarzone.com/  The Russ Wilcox Jazz Group : https://www.facebook.com/rwilcoxjazz/  

Google Cloud Platform Podcast
Human-Centered AI with Di Dang

Google Cloud Platform Podcast

Play Episode Listen Later May 7, 2019 38:04


Mark Mirchandani and Michelle Casbon take over the show this week to discuss AI and the PAIR Guidebook to Human-Centered AI. Mark Mandel pops in on the interview, and Di Dang, Design Advocate at Google, talks about her role in designing and building the guidebook with the intent of helping others create quality AI projects. Di describes human-centered AI as a practice of not only being conscious of the project being built, but also considering how this AI project will impact us as humans at the end of the day. We influence machine learning so much, both intentionally and unintentionally, and it’s our job to look at the project and results as a whole. In the guidebook, topics like data bias in machine learning, what design patterns work, how to establish trust with the user, and more are addressed. Di explains that the guidebook is a work in progress that will develop with input from users and advances in technology. Di Dang Di Dang recently joined Google’s Design Relations team as a Design Advocate supporting emerging technologies such as augmented reality and machine learning. Previously, she worked as a Senior UX Designer and led the Emerging Tech group at Seattle-based digital agency POP, advising clients on how VR/AR, web/mobile, conversational UI, and machine learning could benefit their end users. With a degree in Philosophy and Religion, she considers herself an optimistic realist who is passionate about ethical design. You can find Di onstage doing improv or on Twitter @dqpdang. Cool things of the week Bringing the best of open source to Google Cloud customers blog James Ward’s Cloud Run button site Michelle’s favorite codelabs from I/O TPU-speed data pipelines site Your first Keras model site Convolutional neural networks site Modern convnets, squeezenet, with Keras with TPUs site Interview People + AI Guidebook site PAIR site GCP Podcast Episode 114: Machine Learning Bias and Fairness with Timnit Gebru and Margaret Mitchell podcast Machine Learning Crash Course site Google Clips site Google Brain Team site Question of the week How do I get started with practical AI? Build an Appointment Scheduler Chatbot with Dialogflow Where can you find us next? Michelle will be at Google I/O and Kubecon Europe. No I/O event in your area? You can host one!

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Deep Learning for Population Genetic Inference with Dan Schrider - TWiML Talk #249

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Apr 8, 2019 49:53


Today we’re joined by Dan Schrider, assistant professor in the department of genetics at The University of North Carolina at Chapel Hill. My discussion with Dan starts with an overview of population genomics and from there digs into his application of machine learning in the field, allowing us to, for example, better understand population size changes and gene flow from DNA sequences. We then dig into Dan’s paper “The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference,” which was published in the Molecular Biology and Evolution journal, which examines the idea that CNNs are capable of outperforming expert-derived statistical methods for some key problems in the field. Thanks to Pegasystems for sponsoring today's show! I'd like to invite you to join me at PegaWorld, the company’s annual digital transformation conference, which takes place this June in Las Vegas. To learn more about the conference or to register, visit pegaworld.com and use TWIML19 in the promo code field when you get there for $200 off. The complete show notes for this episode can be found at https://twimlai.com/talk/249.

Misreading Chat
#52 – Convolutional Color Constancy

Misreading Chat

Play Episode Listen Later Mar 7, 2019


機械学習を使ったホワイトバランスのアルゴリズムについて森田が話します。

Dataspaning
#9 AI in India, US and Japan - with Gautam Bajaj

Dataspaning

Play Episode Listen Later Dec 5, 2018 57:35


This episode features Gautam Bajaj, an engineer and data scientist that has worked with technologies mostly related to AI and machine learning. He has worked in India, USA, a leading video game company in Japan and is now consulting in Tokyo. We talk about getting into the field of data science and AI and about the different working cultures in Japan, India, USA and Sweden. We get into the possibilities and challenges of AI and machine learning, such as scaling and keeping up with developments in such a rapidly evolving field. Some things mentioned in this episode:- YanLeCun, AI course- Hadoop: a framework for distributed processing and big data- Kubernetes: a tool for running applications in "the cloud"- Medium, a blog platform- Recommender/recommendation system- Convolutional neural network (CNN)- Reinforcement learning- Generative adversarial networks (GANs)- Python- Amazon Web Services (AWS): cloud, web hosting etc.- Google Cloud Platform (GCP): cloud, web hosting etc. We do not currently have any external partners and all opinions expressed are solely our own. Nothing discussed on this podcast should be considered as any kind of investment advice. In this episode:- Gautam Bajaj, gautam1237 at gmail dot com- Martin Nordgren, works at Tobii, former engineer at Dirac, @martinjnordgren Contact us:dataspaning.se@dataspaning @ Twitterdataspaning@gmail.com

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Oct 8, 2018 55:29


In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern graph analytic systems,” which we had a great chat about back in New York. We start our discussion with a look at the role of graph analytics in the machine learning toolkit, including some important application areas for graph-based systems. We continue with an overview of the different ways to implement graph analytics, with a particular emphasis on the emerging role of what he calls graphical processing engines which excel at handling large datasets. We also discuss the relationship between these kinds of systems and probabilistic graphical models, graphical embedding models, and graph convolutional networks in deep learning. The complete show notes for this episode can be found at twimlai.com/talk/188. For more information on the Strata Data Conference series, visit twimlai.com/stratany2018.

The AI Element
Startups vs. Traditional Industry

The AI Element

Play Episode Listen Later Jul 24, 2018 25:39


As AI seeps into every industry, businesses are being forced to adapt. Old school industries may not be as lean or quick to pivot as startups, but they have access to a motherlode of funding and data. Still, red tape and outdated infrastructure may block them from the timely AI transformation they need to stay afloat tomorrow. Alex Shee speaks with serial AI entrepreneur JF Gagné about this tension between startups and more corporate environments. Then, 15-year veteran of the insurance industry Natacha Mainville shares some real-world examples of how AI is flipping the industry on its head, forcing incumbents to keep up. Featured in this episode: JF Gagné, CEO of Element AI Natacha Mainville, Chief Innovation Officer at TandemLaunch Mentioned in the episode: JDA Software, retail and supply chain solutions Element AI, AI solutions provider Convolutional neural networks (Wikipedia) Lemonade Renters & Home Insurance, insurance startup

Linear Digressions
A Conceptual Introduction to Capsule Networks

Linear Digressions

Play Episode Listen Later Apr 8, 2018 14:05


Convolutional nets are great for image classification... if this were 2016. But it's 2018 and Canada's greatest neural networker Geoff Hinton has some new ideas, namely capsule networks. Capsule nets are a completely new type of neural net architecture designed to do image classification on far fewer training cases than convolutional nets, and they're posting results that are competitive with much more mature technologies. In this episode, we'll give a light conceptual introduction to capsule nets and get geared up for a future episode that will do a deeper technical dive.

Linear Digressions
Convolutional Neural Nets

Linear Digressions

Play Episode Listen Later Apr 1, 2018 21:55


If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.

ZADevChat Podcast
71 - Is Intelligence an Algorithm? With Jade Abbott

ZADevChat Podcast

Play Episode Listen Later Aug 29, 2017 70:49


We chat to Jade Abbott from Retro Rabbit about artificial intelligence, broadly and more specifically about NLP and what that means for us. Chantal, Kenneth & Len talk to Jade about natural language processing, commonly referred to as NLP. What does it take to get a machine to understand what we're saying as people? Jade has always had a fascination with smart machines, from trying to build robots in school and now teaching machines to understand what we're saying. Jade takes a fairly complex topic and helps us come to terms with it. We question whether people, or intelligence, is algorithmic and what that means. Processing natural language is not without challenges and Jade walks us through the maze of terminology and some tools to get started with, and we have several resources below to help as well. What would happen if AI tries to right a movie? What happens if the movie was made? Importantly, neural nets are not the whole of AI. We wander around expert systems, random forests, and other great statistical models that very useful and predictable. Is intelligence just an algorithm? What do you think? Let us know! Find and follow Jade online: * https://twitter.com/alienelf * http://github.com/jaderabbit * https://twitter.com/fmfyband * https://fmfy.bandcamp.com/ Jade has some repos with sample projects on GitHub: * https://github.com/jaderabbit/botcon2016 * https://github.com/jaderabbit/deepdreamsofelectricsheep * https://www.kaggle.com/jaderabbit/training-an-lstm-to-write-songs Jade offers some great resources not specifically covered in the show. For people looking to get into AI: * https://www.coursera.org/learn/machine-learning - Andrew Ng's Coursera Machine Learning course * Kaggle - http://www.kaggle.com/ * https://medium.freecodecamp.org/the-best-data-science-courses-on-the-internet-ranked-by-your-reviews-6dc5b910ea40 How Neural Networks Really Work: * https://www.youtube.com/watch?v=EInQoVLg_UY&t=78s How Neural Network's Really Work by Geoffrey Hinton Using Natural Language for AI * The key blog post on using deep learning for Natural Language Processing: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ * Beautiful tutorial on Word2Vec http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/ * My kaggle notebook for training a neural network to generate songs: https://www.kaggle.com/jaderabbit/training-an-lstm-to-write-songs Here are some resources mentioned in the show: * Ex Machina - https://en.wikipedia.org/wiki/Ex_Machina_(film) * Alien Covenant - https://en.wikipedia.org/wiki/Alien:_Covenant * Marvin from Hitchhikers Guide - https://en.wikipedia.org/wiki/Marvin_(character) * Sherlock - https://en.wikipedia.org/wiki/Sherlock_(TV_series) * Natural Language Processing - https://en.wikipedia.org/wiki/Natural_language_processing * word2vec - https://en.wikipedia.org/wiki/Word2vec * Convolutional neural network - https://en.wikipedia.org/wiki/Convolutional_neural_network * Recurrent neural network - https://en.wikipedia.org/wiki/Recurrent_neural_network * Kaggle - https://www.kaggle.com/ * Sunspring | A Sci-Fi Short Film - https://www.youtube.com/watch?v=LY7x2Ihqjmc * Rabbiteer - https://rabbiteer.io/ And finally our picks Jade: * Kaggle - https://www.kaggle.com * Sunspring | A Sci-Fi Short Film - https://www.youtube.com/watch?v=LY7x2Ihqjmc * Creativity: how is AI impacting this human skill? - http://bit.ly/2wjumoD Chantal: * For Computers, Too, It's Hard to Learn to Speak Chinese - http://bit.ly/2graOJr Kenneth: * Westworld - https://en.wikipedia.org/wiki/Westworld_(TV_series) Len: * Instaparse - https://github.com/Engelberg/instaparse Thanks for listening! Stay in touch: * Website & newsletter - https://zadevchat.io * Socialize - https://twitter.com/zadevchat & http://facebook.com/ZADevChat/ * Suggestions and feedback - https://github.com/zadevchat/ping * Subscribe and rate in iTunes - http://bit.ly/zadevchat-itunes

Machine Learning – Software Engineering Daily
Convolutional Neural Networks with Matt Zeiler

Machine Learning – Software Engineering Daily

Play Episode Listen Later May 10, 2017 54:37


Convolutional neural networks are a machine learning tool that uses layers of convolution and pooling to process and classify inputs. CNNs are useful for identifying objects in images and video. In this episode, we focus on the application of convolutional neural networks to image and video recognition and classification. Matt Zeiler is the CEO of The post Convolutional Neural Networks with Matt Zeiler appeared first on Software Engineering Daily.

Rebuild
181: UNK Reply Bot (higepon)

Rebuild

Play Episode Listen Later May 1, 2017 70:24


Taro Minowa さんをゲストに迎えて、ボット、機械学習、AI などについて話しました。 Show Notes seq2seq の chatbot を日本語で動かしてみた - Higepon’s blog ひげみbot (@higepon_bot) Convolutional neural network Sequence-to-Sequence Models ゼロから作るDeep Learning ―Pythonで学ぶディープラーニングの理論と実装 TensorFlow Keras Theano Chainer 意味分からない。最初からKeras使った方が良くない?流石日本人。Chainer好きすぎでしょ。 MeCab: Yet Another Part-of-Speech and Morphological Analyzer りんな Twitter taught Microsoft’s AI chatbot to be a racist asshole deepmind/sonnet: TensorFlow-based neural network library FaceApp apologizes for building a racist AI Google Photos labeled black people 'gorillas' 新海誠監督の映画から無断使用 「Everfilter」をアニメ会社が調査 Is Expensify using Mechanical Turk for reading my receipts? Introducing Echo Look - Hands-Free Camera and Style Assistant Your Samsung TV is eavesdropping on your private conversations Google Home now supports multiple users Google shuts down Burger King's cunning TV ad Facebook is developing a way to read your mind

Rebuild
169: Your Blog Can Be Generated By Neural Networks (omo)

Rebuild

Play Episode Listen Later Dec 25, 2016 105:57


Hajime Morita さんをゲストに迎えて、達人プログラマーなどについて話しました。 Show Notes Rebuild: Supporter Naoya Ito: "業界の悪習: 新人に10冊も20冊も自分が読んだ本を薦める" 新装版 達人プログラマー 職人から名匠への道 | Amazon 新装版 達人プログラマー 職人から名匠への道 | オーム社 eBook Store The Pragmatic Bookshelf Convolutional neural network Rational Unified Process UML 統一モデリング言語 Plantuml レガシーコード改善ガイド Add Code from a Template | Android Studio Protocol Buffers Amazon Athena Sumo Logic Splunk jq Becky! Internet Mail Wanderlust リファクタリング 既存のコードを安全に改善する CODE COMPLETE 第2版 上 Error handling and Go Thinking in React - React Design Patterns Martin Fowler UNIXという考え方―その設計思想と哲学 Takuto Wada: "若者への課題図書としてまずは『達人プログラマー』と『UNIXという考え方』を挙げた" 新卒ソフトウェアエンジニアのための技術書100冊 - クックパッド開発者ブログ The Best Software Writing I: Selected and Introduced by Joel Spolsky steps to phantasien

Learning Machines 101
LM101-059: How to Properly Introduce a Neural Network

Learning Machines 101

Play Episode Listen Later Dec 20, 2016 29:56


I discuss the concept of a “neural network” by providing some examples of recent successes in neural network machine learning algorithms and providing a historical perspective on the evolution of the neural network concept from its biological origins. For more details visit us at: www.learningmachines101.com  

More Than Just Code podcast - iOS and Swift development, news and advice

We start out discussing the iOS update 9.3.5 which prevents a remote jailbreak of iOS devices, giving bad guys access to users personal data and messages. We also follow up on Touch IC chips popping off iPhones, Star Wars serialization, neural networks, shut down of sync and open sourcing of Vesper, merger of Instapaper with Pinterest and Nest being moved internally into Google. Pro Tip: you may need to clean out your lightning port. We discuss the EU's claim that Apple underpaid on corporate taxes in Ireland. We discuss whether Griffin's Bluetooth adapter for cabled headphones indicates iPhone 7's rumored lack of mini-phone jack. Aaron expounds on his new Car Play enabled car. Picks: Charles Proxy v4, FLEX, Classic CHM youtube channel, How Snapchat’s filters work, Papa John’s Apple TV app, Swift Algorithm Club tutorials. Episode 107 Show Notes: Convolutional neural networks on the iPhone with VGGNet Apple Security Update for iOS Who are the hackers who cracked the iPhone? These are the 25 internet passwords you must not use C4 (conference) Jonathan (Wolf) Rensch The Serial: From Dickens To Star Wars Tony Fadell Brian Hayes MEP on As It Happens (starts at 10:45, end 18:06) iPhone owners sue Apple over 'Touch Disease' Vesper Sync Shutdown Tonight, Open Source Plans Instapaper engineer Brian Donohue is tweeting up a storm reassuring people Google Will Absorb Nest Developers Apple CEO Tim Cook: $14.5 billion EU tax bill has ‘no basis in fact or in law’ A Message to the Apple Community in Europe - Tim Cook Griffin's Bluetooth adapter is early bet on iPhone 7 removing headphone jack Jaybird Sport Meet the Pilot: Smart Earpiece Language Translator CarPlay for Developers The Best Lightning Cable Episode 107 Picks: Charles Proxy v4 FLEX Classic CHM youtube channel How Snapchat’s filters work Papa John’s Apple TV app Swift Algorithm Club: Swift Binary Search Tree Data Structure

Principles of Digital Communication II
Lecture 13: Introduction to Convolutional Codes

Principles of Digital Communication II

Play Episode Listen Later Jun 16, 2015 81:48


Introduction to Convolutional Codes

Principles of Digital Communication II
Lecture 14: Introduction to Convolutional Codes

Principles of Digital Communication II

Play Episode Listen Later Jun 16, 2015 82:15


Introduction to Convolutional Codes

Learning Machines 101
LM101-030: How to Improve Deep Learning Performance with Artificial Brain Damage (Dropout and Model Averaging)

Learning Machines 101

Play Episode Listen Later Jun 8, 2015 32:02


Deep learning machine technology has rapidly developed over the past five years due in part to a variety of actors such as: better technology, convolutional net algorithms, rectified linear units, and a relatively new learning strategy called "dropout" in which hidden unit feature detectors are temporarily deleted during the learning process. This article introduces and discusses the concept of "dropout" to support deep learning performance and makes connections of the "dropout" concept to concepts of regularization and model averaging. For more details and background references, check out: www.learningmachines101.com !  

Learning Machines 101
LM101-029: How to Modernize Deep Learning with Rectilinear units, Convolutional Nets, and Max-Pooling

Learning Machines 101

Play Episode Listen Later May 25, 2015 35:59


This podcast discusses talks, papers, and ideas presented at the recent International Conference on Learning Representations 2015 which was followed by the Artificial Intelligence in Statistics 2015 Conference in San Diego. Specifically, commonly used techniques shared by many successful deep learning algorithms such as: rectilinear units, convolutional filters, and max-pooling are discussed. For more details please visit our website at: www.learningmachines101.com! 

Learning Machines 101
LM101-023: How to Build a Deep Learning Machine

Learning Machines 101

Play Episode Listen Later Feb 23, 2015 42:45


Recently, there has been a lot of discussion and controversy over the currently hot topic of “deep learning”!! Deep Learning technology has made real and important fundamental contributions to the development of machine learning algorithms. Learn more about the essential ideas of  "Deep Learning" in Episode 23 of "Learning Machines 101". Check us out at our official website: www.learningmachines101.com ! 

Introduction to EECS II: Digital Communication Systems

This lecture starts with historical applications of error control and convolutional codes in space programs. Convolutional codes are introduced along with examples and transformations in shift-register, state-machine, and trellis view.