Podcasts about cvpr

  • 44PODCASTS
  • 93EPISODES
  • 47mAVG DURATION
  • ?INFREQUENT EPISODES
  • Feb 8, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about cvpr

Latest podcast episodes about cvpr

Machine Learning Street Talk
Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Machine Learning Street Talk

Play Episode Listen Later Feb 8, 2025 78:10


Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF.SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?Goto https://tufalabs.ai/***Randall Balestrierohttps://x.com/randall_balestrhttps://randallbalestriero.github.io/Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0TOC:- Introduction - 00:00:00: Introduction- Neural Network Geometry and Spline Theory - 00:01:41: Neural Network Geometry and Spline Theory - 00:07:41: Deep Networks Always Grok - 00:11:39: Grokking and Adversarial Robustness - 00:16:09: Double Descent and Catastrophic Forgetting- Reconstruction Learning - 00:18:49: Reconstruction Learning - 00:24:15: Frequency Bias in Neural Networks- Geometric Analysis of Neural Networks - 00:29:02: Geometric Analysis of Neural Networks - 00:34:41: Adversarial Examples and Region Concentration- LLM Safety and Geometric Analysis - 00:40:05: LLM Safety and Geometric Analysis - 00:46:11: Toxicity Detection in LLMs - 00:52:24: Intrinsic Dimensionality and Model Control - 00:58:07: RLHF and High-Dimensional Spaces- Conclusion - 01:02:13: Neural Tangent Kernel - 01:08:07: ConclusionREFS:[00:01:35] Humayun – Deep network geometry & input space partitioninghttps://arxiv.org/html/2408.04809v1[00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operatorshttps://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf[00:13:55] Song et al. – Gradient-based white-box adversarial attackshttps://arxiv.org/abs/2012.14965[00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustnesshttps://arxiv.org/abs/2402.15555[00:18:25] Humayun – Training dynamics & double descent via linear region evolutionhttps://arxiv.org/abs/2310.12977[00:20:15] Balestriero – Power diagram partitions in DNN decision boundarieshttps://arxiv.org/abs/1905.08443[00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruninghttps://arxiv.org/abs/1803.03635[00:24:00] Belkin et al. – Double descent phenomenon in modern MLhttps://arxiv.org/abs/1812.11118[00:25:55] Balestriero et al. – Batch normalization's regularization effectshttps://arxiv.org/pdf/2209.14778[00:29:35] EU – EU AI Act 2024 with compute restrictionshttps://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf[00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometryhttps://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf[00:40:40] Carlini – Trade-offs between adversarial robustness and accuracyhttps://arxiv.org/pdf/2407.20099[00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methodshttps://openreview.net/forum?id=ez7w0Ss4g9(truncated, see shownotes PDF)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Because of the nature of SAM, this is more video heavy than usual. See our YouTube!Because vision is first among equals in multimodality, and yet SOTA vision language models are closed, we've always had an interest in learning what's next in vision. Our first viral episode was Segment Anything 1, and we have since covered LLaVA, IDEFICS, Adept, and Reka. But just like with Llama 3, FAIR holds a special place in our hearts as the New Kings of Open Source AI.The list of sequels better than the originals is usually very short, but SAM 2 delighted us by not only being a better image segmentation model than SAM 1, it also conclusively and inexpensively solved video segmentation in just an elegant a way as SAM 1 did for images, and releasing everything to the community as Apache 2/CC by 4.0.“In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM).”Surprisingly EfficientThe paper reports that SAM 2 was trained on 256 A100 GPUs for 108 hours (59% more than SAM 1). Taking the upper end $2 A100 cost off gpulist.ai means SAM2 cost ~$50k to train if it had an external market-rate cost - surprisingly cheap for adding video understanding!The newly released SA-V dataset is also the largest video segment dataset to date, with careful attention given to scene/object/geographical diversity, including that of annotators. In some ways, we are surprised that SOTA video segmentation can be done on only ~50,000 videos (and 640k masklet annotations). Model-in-the-loop Data Engine for Annotations and Demo-first DevelopmentSimilar to SAM 1, a 3 Phase Data Engine helped greatly in bootstrapping this dataset. As Nikhila says in the episode, the demo you see wasn't just for show, they actually used this same tool to do annotations for the model that is now demoed in the tool:“With the original SAM, we put a lot of effort in building a high-quality demo. And the other piece here is that the demo is actually the annotation tool. So we actually use the demo as a way to improve our annotation tool. And so then it becomes very natural to invest in building a good demo because it speeds up your annotation. and improve the data quality, and that will improve the model quality. With this approach, we found it to be really successful.”An incredible 90% speedup in annotation happened due to this virtuous cycle which helped SA-V reach this incredible scale.Building the demo also helped the team live the context that their own downstream users, like Roboflow, would experience, and forced them to make choices accordingly.As Nikhila says:“It's a really encouraging trend for not thinking about only the new model capability, but what sort of applications folks want to build with models as a result of that downstream.I think it also really forces you to think about many things that you might postpone. For example, efficiency. For a good demo experience, making it real time is super important. No one wants to wait. And so it really forces you to think about these things much sooner and actually makes us think about what kind of image encoder we want to use or other things. hardware efficiency improvements. So those kind of things, I think, become a first-class citizen when you put the demo first.”Indeed, the team swapped out standard ViT-H Vision Transformers for Hiera (Hierarchical) Vision Transformers as a result of efficiency considerations.Memory AttentionSpeaking of architecture, the model design is probably the sleeper hit of a project filled with hits. The team adapted SAM 1 to video by adding streaming memory for real-time video processing:Specifically adding memory attention, memory encoder, and memory bank, which surprisingly ablated better than more intuitive but complex architectures like Gated Recurrent Units.One has to wonder if streaming memory can be added to pure language models with a similar approach… (pls comment if there's an obvious one we haven't come across yet!)Video PodcastTune in to Latent Space TV for the video demos mentioned in this video podcast!Timestamps* [00:00:00] The Rise of SAM by Udio (David Ding Edit)* [00:03:07] Introducing Nikhila* [00:06:38] The Impact of SAM 1 in 2023* [00:12:15] Do People Finetune SAM?* [00:16:05] Video Demo of SAM* [00:20:01] Why the Demo is so Important* [00:23:23] SAM 1 vs SAM 2 Architecture* [00:26:46] Video Demo of SAM on Roboflow* [00:32:44] Extending SAM 2 with other models* [00:35:00] Limitations of SAM: Screenshots* [00:38:56] SAM 2 Paper* [00:39:15] SA-V Dataset and SAM Data Engine* [00:43:15] Memory Attention to solve Video* [00:47:24] "Context Length" in Memory Attention* [00:48:17] Object Tracking* [00:50:52] The Future of FAIR* [00:52:23] CVPR, Trends in Vision* [01:02:04] Calls to ActionTranscript[00:00:00] [music intro][00:02:11] AI Charlie: Happy Yoga! This is your AI co host Charlie. Thank you for all the love for our special 1 million downloads Wins of AI Winter episode last week, especially Sam, Archie, Trellis, Morgan, Shrey, Han, and more. For this episode, we have to go all the way back to the first viral episode of the podcast Segment Anything Model and the Hard Problems of Computer Vision, which we discussed with Joseph Nelson of Roboflow.[00:02:39] AI Charlie: Since Meta released SAM 2 last week, we are delighted to welcome Joseph back as our fourth guest co host to chat with Nikhila Ravi, Research Engineering Manager at Facebook AI Research and lead author of SAM 2. Just like our SAM 1 podcast, this is a multimodal pod because of the vision element, so we definitely encourage you to hop over to our YouTube at least for the demos, if not our faces.[00:03:04] AI Charlie: Watch out and take care.[00:03:10] Introducing Nikhila[00:03:10] swyx: Welcome to the latest podcast. I'm delighted to do segment anything to our first, one of our very first viral podcasts was segment anything one with Joseph. Welcome back. Thanks so much. And this time we are joined by the lead author of Segment Anything 2, Nikki Ravi, welcome.[00:03:25] Nikhila Ravi: Thank you. Thanks for having me.[00:03:26] swyx: There's a whole story that we can refer people back to episode of the podcast way back when for the story of Segment Anything, but I think we're interested in just introducing you as a researcher, as a, on the human side what was your path into AI research? Why, you know, why did you choose computer vision coming out of your specialization at Cambridge?[00:03:46] Nikhila Ravi: So I did my undergraduate. Degree in engineering at Cambridge university. The engineering program is very general. So first couple of years, you sort of study everything from mechanical engineering to fluid mechanics, structural mechanics, material science, and also computer science.[00:04:04] Nikhila Ravi: Towards the end of my degree, I started taking more classes in machine learning and computational neuroscience, and I really enjoyed it. And actually after graduating from undergrad, I had a place at Oxford to study medicine. And so I was. Initially planning on becoming a doctor, had everything planned and then decided to take a gap year after finishing undergrad.[00:04:28] Nikhila Ravi: And actually that was around the time that sort of deep learning was emerging. And in my machine learning class in undergrad, I remember one day our professor came in and that was when Google acquired DeepMind. And so that became like a huge thing. We talked about it for the whole class. It kind of really stuck.[00:04:48] Nikhila Ravi: And I was kicked off thinking about, okay, maybe I want to try something different other than medicine. Maybe this is a different path I want to take. And then in the gap year, I did a bunch of coding, worked on a number of projects. Did some sort of freelance contracting work. And then I got a scholarship to come and study in America.[00:05:06] Nikhila Ravi: So I went to Harvard for a year, took a bunch of computer science classes at Harvard and MIT, worked on a number of AI projects, especially in computer vision. I really, really enjoyed working in computer vision. I applied to Facebook and got this job at Facebook, and I've now at Facebook at the time, now Meta, and I've been here for seven years, so very circuitous path, probably not a very unconventional, I didn't do a PhD, I'm not like a research, typical research scientist, definitely came from more of an engineering background, but since being at Meta, Have had amazing opportunities to work across so many different interesting problems in computer vision from 3D computer vision.[00:05:50] Nikhila Ravi: How can you go from images of objects to 3D structures and then going back to 2D computer vision and actually understanding the objects and the pixels and the images themselves. So it's been a very interesting journey over the past seven years.[00:06:05] swyx: It's weird because like, I guess with segment anything too, it's like 4D because you solve time, you know, you started with 3D and now you're solving the 4D.[00:06:14] Nikhila Ravi: Yeah, it's just going from 3D to images to video. It's really covering the full spectrum. And actually, one of the nice things has been, so I think I mentioned I, Wanted to become a doctor, but actually Sam is having so much impact in medicine, probably more than I could have ever had as a doctor myself. So I think, you know, hopefully Sam too can also have a similar sort of impact in medicine and other fields.[00:06:39] The Impact of SAM 1 in 2023[00:06:39] swyx: Yeah. I want to give Joseph a chance to comment. Does that also mirror your, we know your story about going into, into vision, but like in the past year, since we did our podcast on Sam what's been the impact that you've seen?[00:06:51] Joseph Nelson: Segment anything. Set a new standard in computer vision, you know recapping from from the first release to present Sam introduces the ability for models to near zero shot meaning without any training identify kind of perfect polygons and outlines of items and objects inside images and that capability previously required a Lots of manual labeling, lots of manual preparation, clicking very meticulously to create outlines of individuals and people.[00:07:25] Joseph Nelson: And there were some models that attempted to do zero shot segmentation. of items inside images, though none were as high quality as segment anything. And with the introduction of segment anything, you can pass an image with SAM1, SAM2 videos as well, and get perfect pixel perfect outlines of most everything inside the images.[00:07:52] Joseph Nelson: Now there are some edge cases across domains and Similar to the human eye, sometimes you need to say, like, which item maybe you most care about for the downstream task and problem you're working on. Though, SAM has accelerated the rate at which developers are able to use computer vision in production applications.[00:08:13] Joseph Nelson: So, at RoboFlow, we were very quick to enable the community of computer vision developers and engineers to use SAM and apply it to their problems. The principle ways of using SAM, you could kind of use SAM as is to like pass an image and receive back masks. Another use case for SAM is in preparation of data for other types of problems.[00:08:37] Joseph Nelson: So, for example, in the medical domain, let's say that you're working on a problem where you have a bunch of images from a wet lab experiment. And from each of those images, you need to count the presence of a particular protein that reacts to some experiment. To count all the individual protein reactions, You can go in and lab assistants to this day will still like kind of individually count and say what are the presence of all those proteins.[00:09:07] Joseph Nelson: With Segment Anything, it's able to identify all of those individual items correctly. But often you may need to also add like a class name to what the protein is. Or you may need to say, hey, like, I care about the protein portion of this. I don't care about the rest of the portion of this in the image.[00:09:26] Joseph Nelson: And, or what it encourages and asks for the user to do is to provide some visual prompting to say, hey, which part, like, Sam says, hey, I can find segments of anything, but which segments do you care about? And so you can do visual prompting, which is kind of a new primitive that Sam introduced. And so at RoboFlow, we have one portion of our tool stack enables users to very quickly label data.[00:09:48] Joseph Nelson: With segment anything, Sam can already provide, hey, here's where I see the outlines of objects. Or a user can click to prompt to say, Hey, here's where the outlines of objects matter. And I recently pulled statistics from the usage of SAM in RoboFlow over the course of the last year. And users have labeled about 49 million images using segment anything on the hosted side of the RoboFlow platform.[00:10:12] Joseph Nelson: And that's like 5 million in the last 30 days alone. And of those images, We did kind of like a rough bafka napkin calculation of like how much time that has saved. Because, again, the alternative is you're clicking individual points to create a polygon, and with SAM you just click once and it guesses where the polygon is.[00:10:32] Joseph Nelson: And I'm sure in a bit we can maybe screen share and show some examples of what this experience is like. And in that time estimation, it's like, On average saves, you know, maybe a dozen or so seconds. And we estimate that this is probably saved on the order of magnitude of 35 years of time for users.[00:10:53] Nikhila Ravi: That's incredible.[00:10:54] Joseph Nelson: So, I mean, basically like in the first, the first year of a model being available, not only can you say, Hey, I'm just going to go use this model, those numbers that like 49 million images. is an estimate directly related to just the hosted side. So imagine all of the users that are self hosting or using SAM for robotics applications or out in the field or offline where it's not even, like, the time or the image counts are tabulated.[00:11:20] Joseph Nelson: And we're probably talking about, you know, just a fraction of the amount of value that's actually being produced for a number of downstream tasks. So to say that the impact has been You know, people use terms like game changing and these sorts of things. It has changed the industry. It's set a new standard.[00:11:36] Joseph Nelson: And with the release of SAM 2, I think we're about to see an acceleration of those capabilities for a lot of reasons.[00:11:42] Nikhila Ravi: That's really great to hear. I think one of the, really SAM 1 was. How many fields actually rely on manual segmentation? I think we're not really exposed to that. Maybe you are at Roboflow because you get to see all the users of these tools.[00:11:57] Nikhila Ravi: But for me, it was, you know, people working on understanding coral reef bleaching or farmers counting their cows and so many different applications that as a researcher. You never get exposed to, but you can have impact towards. So I think that was really awesome to hear.[00:12:15] Do People Finetune SAM?[00:12:15] swyx: So as sort of audience surrogate, who knows less than the two of you, I'm going to ask a really dumb question maybe, but is everyone using stock, a segment, anything?[00:12:23] swyx: Are they fine tuning for the medical domain? Like how on earth could it work for the medical field without fine tuning, right? Like, is that a thing?[00:12:32] Nikhila Ravi: So I mean, I can give a quick perspective from the research side. So one of the things, design decisions we made in SAM was to not have class labels. And so all the data is annotated in a class agnostic way.[00:12:48] Nikhila Ravi: So anything that has a boundary, we consider to be an object. So for example, in any image, there's lots of small objects. We might not know what the name of them are, but they're If you can draw a boundary around it, so you can imagine that we have 11 million images in the SA 1B dataset, we annotated all the objects, there's many, many small objects.[00:13:12] Nikhila Ravi: And so if you think about cells, they're also kind of small objects, there's probably things in the training data. That looked like it, but we didn't have to label it. And so that means that even when you use SAM for applications that it wasn't really trained for, because we didn't restrict it to a certain set of categories, you can actually use it out of the box without custom adaptation.[00:13:35] Nikhila Ravi: But having said that, there's probably certain domains where you need some expertise in order to be able to segment something properly. And for those use cases, Having some extra fine tuning data would probably help, and we've sort of seen that there's some papers that have come out that do this, and, you know, we'd love to hear, Joseph, how people are collecting data with SAM and fine tuning for their use cases.[00:13:59] Joseph Nelson: Once SAM came out, there were adaptations that said, could we use SAM to be, you know, like, efficient SAM? Like, basically take SAM and maybe accelerate it. And then there were domain adapted SAMs, like CellSAM, for example, out of the UC system. Now, what's interesting is, there's, like, adapting SAM to a domain, there's kind of two ways by which that's done.[00:14:21] Joseph Nelson: One is, as you mentioned, like, potentially SAM doesn't have a good concept of The objects of interest. And so you need to do domain adaptation and increase the accuracy for zero shot prediction. The second way though, is it's not fine tuning. It's actually just prompting. It's just guiding the model existing knowledge.[00:14:42] Joseph Nelson: to say which segments you care about. And both those are actually kind of equally important on the application side. You need to, like, a priori ensure that the objects of interest can be correctly segmented and maybe collect data to do that. But even if you had, like, a perfect SAM, like an omniscient SAM that could see every segment in every domain with all pixels perfectly outlined, in production, you would still need some way to Almost like signal to the model what you care about like to paint this picture if you are like a retailer and you are providing Photos of models wearing your clothing on your retail site You may care about you know only the shirt and Sam by default might segment the full person And so there's you know visual prompting that you can do to ensure that you only outline Maybe the shirt for the purposes of swapping in and out different shirts for displaying a given model on a retail page You And so I think what's interesting is that's where, like I wouldn't call it domain adaptation, but that's where, like, when you apply to industry, like, one thing that's particularly important with tooling and enabling SAM to reach its full potential.[00:15:51] swyx: That's really encouraging to hear. I should also think, like, you know, the last time we talked about this, we wanted to, the very natural addition on the class labeling side is the grounding Dino work, right? So I think people, built a grounding SAM and all the other extensions.[00:16:05] Video Demo of SAM[00:16:05] swyx: I think it's, it's probably a good time to cut to a quick demo of SAM2 for people who are, who are tuning in for SAM2 and who better to demo SAM2 than Nikki.[00:16:15] Nikhila Ravi: Sure. So I'll try to narrate what I'm what I'm doing. So audio listeners can also understand. So we have a web demo where anyone can try SAM2 on a video. Here we have a video of someone kicking a football, and I'm going to click on the football to select the object in the first frame. But you can actually select the object in any frame of the video, and this will work.[00:16:40] Nikhila Ravi: The next step is to hit track. So the model's now tracking this in real time. We don't save any of this, it's all running in real time. And now you can see the ball has been tracked throughout the entire video. There's even like a little bit of a challenging case here where the shoe covers the football.[00:16:59] Nikhila Ravi: And actually, you know, the model makes a little bit of a mistake, but that's okay. Because we can actually, here, the model makes a little bit of a mistake here. But you know, we can actually add a refinement click. You can add negative clicks until we get the mask that we want on this frame. And then you can hit track again, and the model will track the object, taking into account the additional information I've provided at that frame.[00:17:25] Nikhila Ravi: We've also added a couple of other fun things you can do on top of the track, like add effects. We can add you know, foreground effects, background effects. And these are just ways of showing how we can use the output from SAM2 as part of other tools like video editing tools. Other systems, so this is just a preview of what you can do with SAM2, but the really cool use cases are places where we might not have even imagined SAM2 being useful.[00:17:54] Nikhila Ravi: So we have a number of examples of things you might want to use it for. There's like underwater videos that it works actually really well for even though we, models never really seen an octopus before and octopus have a lot of moving parts that SAM2 can actually quite effectively. Keep track of all the different tentacles and we can probably see it more clearly if I desaturate the background.[00:18:18] Nikhila Ravi: We can see that actually the tracking of all the different tentacles is Quite accurate. Another challenge with video is that objects can actually become occluded. They can disappear from view and reappear. And a really fun example here is the shuffling cup game, which many of you might have seen. And so here I can click on the ball in the first frame.[00:18:41] Nikhila Ravi: I can also, You know, click on a different cup. And so here, the additional challenge is that there's three cups that look exactly the same. And then there's the ball that will get occluded by the cup. So the ball's no longer visible, the cups are all moving around, they all look the same. But the model actually keeps track of the cup that we selected.[00:19:02] Nikhila Ravi: And, as you can see at the end, here I'll jump to the end so you can see. It actually finds the cup again. I wanted to point out a couple of fun demo UX features that we added that actually really helped with this. So if you can see at the bottom, there's these swim lanes and then the swim lanes, actually the thickness of the swim lane tells you if the object's visible or not.[00:19:22] Nikhila Ravi: So at the beginning, the object's visible,[00:19:25] swyx: the object[00:19:26] Nikhila Ravi: disappears, and then the object comes back. So you can actually visually tell. When the object's being occluded and when it's not, and so it's a nice way of like, knowing if you need to go in and fix the model prediction or not. And so these are some of the UX innovations that we came up with, as well as the model innovations.[00:19:46] Joseph Nelson: One thing that I think is really notable here, there's two things. One is that like, I'd love to have a little bit of a discussion about how the models keeping track of the embedded scene to keep track of the ball and the cup in different places. Put a pause on that for a second.[00:19:59] Why the Demo is so Important[00:19:59] Joseph Nelson: One thing that Meta has put an emphasis on here in a much greater degree than other model releases is the demo experience of recognizing that in addition to having a model that can do zero shot segmentation, you've created a web experience that allows folks to kind of experience both the video effects but the types of UX innovations that encourage usage and adoption.[00:20:23] Joseph Nelson: It's actually kind of reminiscent of The underlying technology of ChatGPT was available prior to the web experience of ChatGPT. Can you talk a bit about why that was a consideration to your team and how you thought about the creation of The demo experience in tandem with training and releasing a new model.[00:20:41] Nikhila Ravi: Yeah, absolutely. I think that's a really great example of how, you know, Chad, GPT was really more of a UX innovation. Obviously it was like a number of research innovations that helped to get to this point. But as you said, like the underlying technology was around for a while. And, you know, putting this UX around as a chat interface helped tremendously with the.[00:21:03] Nikhila Ravi: Adoption and people understanding how it could be useful for real world use cases. And in computer vision, especially, it's so visual. The best way to show how these models work. Is by trying it on your own image or your own video with the original SAM, we put a lot of effort in building like a high quality demo.[00:21:23] Nikhila Ravi: And the other piece here is that the demo is actually the annotation tool. So we actually. Use the demo as a way to improve our annotation tool. And so then it becomes very natural to invest in building a good demo because it speeds up your annotation and improves the data quality and that will improve the model quality.[00:21:43] Nikhila Ravi: With this approach, we found it to be really successful. And obviously externally, people really liked being able to try it. I think, you know, people in fields outside of machine learning would never have tried SAM if we didn't have that demo. And I think that definitely led to a lot of the adoption in, like, diverse fields.[00:22:05] Nikhila Ravi: And so because we saw that with SAM 2, like, the demo was a priority first class citizen from day one. And so we really invested in making that. And I think with SAM2 as well, we wanted to have like a step change in the demo experience. Interactive video segmentation, I think that experience is something that maybe has not had much thought given to it.[00:22:27] Nikhila Ravi: And we really wanted to be like, okay, if we are to design a step changing video segmentation experience, what would that look like? And that really did influence our model. And annotation design as well.[00:22:40] Joseph Nelson: It's a really encouraging trend for not thinking about only the new model capability, but what sort of applications folks want to build with models as a result of that downstream.[00:22:49] Nikhila Ravi: I think it also really forces you to think about many things that you might postpone, for example, efficiency.[00:22:55] Joseph Nelson: Yes.[00:22:55] Nikhila Ravi: For a good demo experience. Making it real time is super important. No one wants to wait. And so it really forces you to think about these things much sooner and actually makes us think about how to, what kind of image encoder we want to use or like other hardware efficiency improvements.[00:23:13] Nikhila Ravi: So those kinds of things, I think, become a first class citizen when you put the demo first.[00:23:19] SAM 1 vs SAM 2 Architecture[00:23:19] Joseph Nelson: That's one thing I was going to ask about, and this is related to the architecture change. So SAM1 and the SAM1 demo experience. You have the encoder that's creating the embeddings of all the potential spaces.[00:23:31] Joseph Nelson: That needs to be run on a GPU. That's a relatively intensive operation. But then the query of those embeddings can be run independently and on a cheaper process. So in the SAM1 demo, the way that it was structured, and also this is the way that we have our SAM tool structured in Robloflow as well, is images go to a GPU to get all the SAM based embeddings.[00:23:53] Joseph Nelson: But then for querying those embeddings, we do that client side, in the browser, so that the user can very quickly, you know, you can move your mouse over and you get the proposed candidate masks that Sam found for that region of the image. In SAM 2 you dropped that in the web demo. And I think that's because you made some notable improvements to the rate at which encoding happens.[00:24:16] Joseph Nelson: Can you talk a bit about what led to those speed increases and, again, how that interplays with providing a fast encryption? user experience for interacting with the model.[00:24:29] Nikhila Ravi: Yeah. So the SAM2 web demo is primarily focused on video. We, we decided to just keep it simple and focus on video and on GitHub, we have a Colab notebook that shows how to run SAM2 on images.[00:24:41] Nikhila Ravi: So if you're interested in using, replacing SAM with SAM2 for images, check out GitHub, but on the SAM2 demo, it's not as straightforward to adopt the same architecture as SAM. For video, because we can't send the per frame image embeddings for an entire video back to the front end. In SAM, each frame embedding was like four megabytes, but if you have a long video and that's like per frame, it would become impossible to send that back to the front end.[00:25:11] Nikhila Ravi: So, SAM 2 actually, in terms of the architecture details, I was actually just looking at this earlier, but SAM1 model was around 630 million parameters. It's a fraction of the size of these large language models, but very small. Actually, SAM2, the largest model, is around 224 million parameters. So it's actually One third the size of the SAM original model.[00:25:38] Nikhila Ravi: So we changed the imaging coder from A-V-I-T-H and SAM to a higher model, which has also developed by by meta. So that definitely was something that helped. And in terms of the efficiency compared to sam, so if we were to run SAM per frame on a video or run SAM two, it's around six times faster to run SAM two versus run SAM per frame.[00:26:03] Nikhila Ravi: A number of things improved the efficiency of SAM2 such that we were actually able to run this entirely on the server and not have any component in the front end. But I am very curious to see who puts this on device, like I'm pretty sure soon we'll see like an on device SAM2 or, you know, maybe even running in the browser or something, so.[00:26:25] Nikhila Ravi: I think that could definitely unlock some of these edge use cases that we were able to make a compelling web demo without having to do that.[00:26:34] swyx: Hugging face is probably already working on Transformers. js version of it, but totally makes sense. I want to talk about more about things from the paper, but I think we're still in this sort of demo section.[00:26:42] Video Demo of SAM on Roboflow[00:26:42] swyx: And so I want to hand it to Joseph for his demo to see what the RoboFlow site looks like.[00:26:47] Joseph Nelson: So I can, I can give some context into one key area that Nicola, you mentioned earlier, which is. Sam has made the decision, both Sam 1 and Sam 2, to be class agnostic in terms of its predictions. And that, you then have the ability to have a generalizable, model for zero shot capability.[00:27:05] Joseph Nelson: However, in a lot of domain applications, you do want the class wise name. And so a lot of the challenge can be adding that class wise name for the, at least the annotation to an experience that we've created. That's one of the key considerations. So I will similarly Share my screen and show an example.[00:27:27] Joseph Nelson: Here, I have a bunch of images, and there's a number of ways that I could annotate things, like I could prompt a large multimodal model with like grounding capabilities, you know, you could outsource it, or I can do manual labeling. And with the manual labeling, this is where we make use of models like segment anything.[00:27:45] Joseph Nelson: to propose candidate masks and make it faster. So we have, you know, this annotation pane and what we call the smart poly tool, which is powered by Segment Anything. This is currently Segment Anything 1. We're accelerating and seeing improvements from similar to what the paper shows of Segment Anything 2 performed better on E3.[00:28:06] Joseph Nelson: Images as well as video, but with a segment, anything I'm able to basically prompt regions of my image of interest. So for example, if like, I wanted to say, I want to like add the drum set. You'll see here that like, the original candidate proposal is just the base drum, but let's say I wanted the whole drum set.[00:28:26] Joseph Nelson: So the UX primitive of being able to add and subtract candidate regions of interest is really intuitive here. And now, great, I have this outline, but in fact what I want is, I want to name that as a class. Because maybe for the model that I'm building, I want to build like a task specific model, you know, like an object detection model or an instant segmentation model.[00:28:50] Joseph Nelson: Or, you know, maybe I'm even using like a multimodal model and I want that multimodal model to refer to regions of interest in the images as a specific thing. And so I think what's, you know, really powerful is, of course, like, I get this really rich zero shot prediction. And here we have our friend Rick.[00:29:10] Joseph Nelson: So I get this really rich candidate set of predictions. But then by adding the class wise label, I can, you know, very quickly make sure that any downstream tasks are aware not just of the segment, but also of the, what is inside that segment. Which actually takes me to A separate point of something that I predict that's probably going to happen and Nikhil, I'm actually kind of interested why maybe your team made a conscious decision to not do this initially with SAM2.[00:29:40] Joseph Nelson: There's been an emergent set of models that are also adding open text prompting capabilities to grounding models. So for example, like you've seen models like Grounding Dino or Owlvit, which, you know, you can do. Even image to image or text to image based prompting to find regions of interest. And maybe maybe I can actually give an example of that even in the context of this same data.[00:30:05] Joseph Nelson: So if I wanted to try out, you know, grounding dino on this same set of images, I could try out, you know, prompting grounding dino for a set of different classes. And what's notable is let's do, I don't know, let's prompt for person and we'll prompt for person and prompt for I don't know, microphone.[00:30:26] Joseph Nelson: NLASC or microphone. Here I can text prompt the image and then the understanding, in this case Grounding Dino's understanding, of where people are in this image allows me to create, in this case, bounding boxes, but, you know, soon you can do segmentations or in tandem with SAM do segmentations. And, you know, we've already seen applications of using SAM2 in tandem with models like Grounding Dino or Florence 2.[00:30:54] Joseph Nelson: So that people can basically text prompt and then get the benefits of the zero shot segmentation at the same time as getting the open form querying. And in doing so, you know, we maintain a framework called like autodistill so like folks can very quickly, you know, bring some images and then using autodistill to find some ontology and then prompt and say what you want from that ontology.[00:31:19] Nikhila Ravi: So you already do this for video as well?[00:31:21] Joseph Nelson: You can apply videos or groups of images, yes. So this is using a project called Autodistill. And the concept of Autodistill is, use a base model, like a big base model, which could be like SAM or Grounding Dino, and then you pass a directory of images, which also could be video, broken into individual frames, and you pass an ontology as well.[00:31:43] Joseph Nelson: So an example I was just showing was like the hello world we have, which is like a shipping container. And then the combination of the grounding capabilities of, in the example I was showing, Florence 2 plus SAM, looks for the concept of container, and then SAM does the rich segmentation of turning that concept of container into the candidate proposal of the region, so that a user could just say, hey, I want all the shipping containers, run this across a bunch of images or video frames, And then get back the class wise labels plus the regions of interest.[00:32:17] Joseph Nelson: And this feels like a natural extension. And in fact, like the open form grounding capabilities between SAM1 and SAM2 became something the field was broadly doing. So I'm curious, like, from your perspective, one of the things I thought maybe SAM2 would do is actually add this capability natively. So I'm curious to hear, like, the conscious decision to say, hey, we want to continue to be class agnostic.[00:32:39] Extending SAM 2 with other models[00:32:39] Joseph Nelson: We don't want to add yet maybe open form text prompting as a part of finding the segments and parts of images. And I'd love to hear about like the decision to think about it that way. And if you are encouraged or if you want kind of like what's happening here where people are naturally combining these capabilities as something that you would expect and encourage to happen despite not having it.[00:33:00] Joseph Nelson: In the base model itself.[00:33:02] Nikhila Ravi: Yeah, it's a great question. So I think it's really cool that the community is taking SAM and taking SAM 2 and building on top of it and coming up with cool applications. We love to see that. That's exactly why we open source our work. And then in terms of why we didn't put it into SAM 2, so as you've probably seen with SAM and SAM 2, it's a fairly narrow problem.[00:33:25] Nikhila Ravi: But we really tried to make it a step change in the capability. And so with each version, we are trying to limit the focus on one thing that we can know we can do really well. And in this case, like the first SAM, it was class agnostic segmentation, but can we do it so well that it's effectively solved?[00:33:47] Nikhila Ravi: And similarly, can we do that same thing, but with Video segmentation. So one step at a time, we are working on each of these problems one at a time so that we can actually deliver something that's really world class and step changing.[00:34:03] Joseph Nelson: So does that mean SAM 3 will have the text prompting? Problem is like the next challenge.[00:34:09] Nikhila Ravi: Who knows, who knows? Maybe the community will, will we'll build that too. So[00:34:15] Joseph Nelson: it makes sense to like very narrowly do something very well. And that's, I think, proven to be well accomplished.[00:34:21] Nikhila Ravi: It's like taking the, the, both the data, the model and the demo, and how can we push all three towards solving one thing really well?[00:34:30] Nikhila Ravi: So we found that. That's like a good recipe and that's what we've limited the focus of these, of each of these models.[00:34:38] swyx: This development reminds me of how, you know, when you do, and you break out the interpretability of ConvNets and you can see like, Oh, this is the edge detection one. I feel like SAM is the edge detection version equivalent.[00:34:51] swyx: And then you build up to whatever the next feature is on top of that.[00:34:54] Limitations of SAM: Screenshots[00:34:54] Joseph Nelson: Can I bring up one? Limitation of SAM. So like we've like even SAM one, SAM two, and the monitor is released at 4 PM Pacific on Monday. We're recording this on 11 AM Pacific on, on, on Thursday. So the, it's very fresh for a lot of the capabilities and.[00:35:09] Joseph Nelson: It is so clear that it is a stepwise change in the capability that, Nikhila, you mentioned your team wants to do, which is extend SAM's zero shot class agnostic capability to video, like, A plus, kind of mission accomplished. One thing that's interesting is finding, like, domain problems where there might be still domain applicability and domain adaptation that is available.[00:35:32] Joseph Nelson: One benchmark that we introduced at CBPR is this thing called RF100, which is like, seven different domain type problems that the industry commonly is working on in vision, like underwater document processing, aerial examples, medicine examples. And one place where interestingly segment anything maybe less performant than other models is handling screenshots.[00:35:57] Joseph Nelson: For example, like a lot of folks that are building agents to interact with the web are particularly interested in that challenge of given a screenshot of a computer, what are all the buttons. And how could I autonomously navigate and prompt and tell it to click? And I can show an example of like maybe what, how like Sam kind of performs on this challenge just to outline some of the context of this problem.[00:36:23] Joseph Nelson: But I'm curious like how you think about limitations like this and what you would expect to want to be the case. So here I just have a notebook where I run Sam on the source image on the left. Or the source image on the left and then Sam output is on the right. And this is just a screenshot of, of a website where we just grab like the top 100 websites by traffic and grab screenshots from them.[00:36:42] Joseph Nelson: One example of a place where I could see the community improving on Sam, and I'm curious how you think about this challenge and maybe why Sam is less well adapted for this type of problem. Is processing screenshots. So I'll share my screen to give an example for, for viewers that are participating here, you see like an example, a screenshot of a website on the left, and then right is SAM two running on that image.[00:37:06] Joseph Nelson: And in the context of agents, folks usually want to have like, Hey, tell me all of the buttons that a, an agent could press. Tell me like maybe the headlines of the articles tell me the individual images and Sam two behaves perhaps predictably, where it outlines like people in the images and like some of like the, the screen text.[00:37:22] Joseph Nelson: I'm curious, like, how you think about a challenge like this for a model that sees everything in the world, what about handling digital contexts? And Why maybe it could perform better here and how you would expect to see improvement for domains that might have been out of distribution from the training data?[00:37:40] Nikhila Ravi: Yeah, this is a good question. So fair, we don't really build with a specific use case in mind. We try to build like these foundational models that can be applied to lots of different use cases out of the box. So I think in this kind of example, potentially people might want to annotate some data.[00:37:59] Nikhila Ravi: Fine tune on top of what we release. I think we probably won't build things that are very custom for different use cases. I think that's not a direction we'll go in, but as you said, like the model is an annotation tool to improve the model. And so I think that's definitely the approach we want to take is we provide the tools for you to improve the model as well as the model itself.[00:38:27] Joseph Nelson: That makes sense. Focus on like as many. Multi or zero shot problems and then allow the community to pick up the torch for domain adaptation.[00:38:34] Nikhila Ravi: Yeah, absolutely. Like, we can't solve all the problems ourselves. Like, we can't solve all the different domains. But if we can provide a sort of base hammer tool, and then people can apply it to all their different problems.[00:38:48] SAM 2 Paper[00:38:48] swyx: If you don't mind, I guess we want to transition to a little bit on like asking more questions about the paper.[00:38:53] Udio AI: Sure.[00:38:54] swyx: There's a lot in here. I love the transparency from Meta recently with like LLAMA 3 last week and then, and was it last week? Maybe, maybe a little bit less than last week. But just like just really, really well written and a lot of disclosures, including the data set as well.[00:39:08] SA-V Dataset and SAM Data Engine[00:39:08] swyx: I think the top question that people had on the data set, you know, you release a diverse videos and there was, there's a lot of discussion about the data engine as well, which I really love. And I think it's innovative if you wanted. I think the top question is like, how do you decide the size of data set?[00:39:22] swyx: You know, what were you constrained by? People are asking about scaling laws. You had some ablations, but as a research manager for this whole thing, like how do you decide what you need?[00:39:32] Nikhila Ravi: Yeah. I mean, it's a great question. I think it's, as with all papers, you write them at the end of the project, so we can put these nice plots at the end, but going into it, I think, you know, the data engine design really follows.[00:39:47] Nikhila Ravi: So, this is sort of the model design, how we thought about the task, how we thought of the model capabilities. You can really see it's reflected in the different phases of the data engine. We started with just SAM, we apply SAM per frame. That's like the most basic way of extending SAM to video. Then the most obvious thing to do is to take the output masks from SAM and then provide it as input into a video object segmentation model that takes the mask as the first frame input.[00:40:19] Nikhila Ravi: And that's exactly what we did. We had SAM plus a version of SAM2 that only had mask as input. And then in the last phase, we got rid of SAM entirely and just had this one unified model that can do both image. And video segmentation. And I can do everything in just one model. And we found that, you know, going from each phase, it both improved the efficiency and it improved the data quality.[00:40:46] Nikhila Ravi: And in particular, when you get rid of this two part model, one of the advantages is that when you make refinement clicks, so, You prompt the model in one frame to select an object, then you propagate those predictions to all the other frames of the video to track the object. But if the model makes a mistake and you want to correct it, when you have this unified model, you only need to provide refinement clicks.[00:41:14] Nikhila Ravi: So you can provide maybe a negative click to remove a region or a positive click to add a region. But if you had this decoupled model, you would have to Delete that frame prediction and re annotate from scratch. And so you can imagine for more complex objects, this is actually adding like a lot of extra time to redefine that object every time you want to make a correction.[00:41:39] Nikhila Ravi: So both the data and the data engine phases really follow, like how we thought about the model design and the evolution of the capabilities, because it really helped us to do that. improve the data quality and the annotation efficiency as well.[00:41:54] swyx: Yeah, you had a really nice table with like time taken to annotate and it was just going down and down.[00:41:58] swyx: I think it was like down by like 90 percent by the time you hit stage[00:42:02] Joseph Nelson: three, which is kind of cool. We joke that when SAM 1 came out at RoboFlow, we're like, was this purpose built for our software? Like you have like the embedding, you have the embedding take like a big model and the querying of the embeddings A smaller model that happens in browser, which felt remarkably aligned.[00:42:18] Joseph Nelson: Now hearing you talk about how you think about building models with a demo in mind, it makes sense. Like, you're thinking about the ways that folks downstream are going to be consuming and creating value. So, what felt like maybe a coincidence was perhaps a deliberate choice by Meta to take into account how industry is going to take Seminal advances and apply them.[00:42:36] Nikhila Ravi: Yeah. And it's not just humans. Like it could also be a model that outputs boxes that then get fed into this model. So really thinking about this as a component that could be used by a human or as a component, as part of a, of a larger AI system. And that has, you know, a number of design requirements. It needs to be promptable.[00:42:56] Nikhila Ravi: It needs to be, have the zero shot generalization capability. We, you know, need it to be real time and. Those requirements really are very core to how we think about these models.[00:43:08] Memory Attention to solve Video[00:43:08] swyx: I cannot end this podcast without talking about the architecture, because this is your, effectively the sort of research level, architecture level innovation that enabled what I've been calling object permanence for SAM.[00:43:22] swyx: And it's memory retention. What was the inspiration going into it? And you know, what did you find?[00:43:27] Nikhila Ravi: Yeah, so at a high level, the way we think about extending SAM to video is that an image is just a special case of a video that just has one frame. With that idea in mind, we can extend the SAM architecture to be able to support segmentation across videos.[00:43:45] Nikhila Ravi: So this is a quick video that shows how this works. So SAM architecture, we have the image encoder, we have a prompt encoder, we have a mask decoder. You can click on an image. And that basically is a prompt, we use that prompt along with the image embedding to make a mask prediction for that image. Going to SAM2, we can also apply SAM2 to images because we can, you know, as I said, treat an image as a video with a single frame.[00:44:15] Nikhila Ravi: And so when we, in the SAM2 architecture, we introduce this new memory mechanism that consists of three main components. There's memory attention, there's a memory encoder, and then there's a memory bank. And when we apply SAM2 to images, these are effectively not used. And the architecture just collapses down to the original SAM architecture.[00:44:35] Nikhila Ravi: But when we do apply this to video, the memory components become really useful because they provide the context of the target object from Other frames. And so this could be from past frames. It can be from, there's two types of memory. So there's like the condition, conditional frames or the prompted frames, which are basically the frames at which a user or a model provides input like clicks.[00:45:01] Nikhila Ravi: And then there's like the surrounding frames. And say we use six frames around the current frame as memory of the object. So there's, there's those, those, both those types of memory that we use to make the prediction. Going into a little bit more detail about that, there's like two kinds of memory that we use.[00:45:18] Nikhila Ravi: So one is like spatial memory. So it's like this high resolution memory that captures the spatial details. And then we also have this like longer term object pointer memory that captures some of the sort of higher level concepts. And I think Swyx, you had a comment about how does this relate to sort of context window and LLMs.[00:45:37] Nikhila Ravi: And both of these types of memories have some relation to context window, so they both provide different types of information on the spatial side or in terms of the concept of the objects that we want to track. And so we found that having like six frame length for the spatial memory, Coupled with this longer period of the object pointer memory provides strong video segmentation accuracy at high speed.[00:46:01] Nikhila Ravi: So, as I mentioned, the real time aspect is really important. We have to find this speed accuracy trade off. And one way in which we sort of circumvent this is by allowing additional prompts on subsequent frames. So even if the model makes a mistake, maybe it loses the object. After an occlusion, you can provide another prompt, which actually goes into the memory.[00:46:24] Nikhila Ravi: And so the prompted frames are always in the memory. And so if you provide a prompt on a frame, we will, or the model will always remember what you provided. And so that's a way in which we can sort of avoid some of the model failure cases that actually is a big limitation of current models, current video object segmentation models.[00:46:45] Nikhila Ravi: Don't allow any way to recover if the model makes a mistake. And so, Joseph, going back to your point about the demo, that's something that we found just by playing with these models. There's no way to make a correction, and in many real world use cases, like, it's not going to be a one time prediction, but you actually want to be able to intervene, like, if an LLM makes a mistake, you can actually be like, no, actually do it this way, and provide feedback, and so, We really want to bring some of that thinking into how we build these computer vision models as well.[00:47:16] "Context Length" in Memory Attention[00:47:16] swyx: Amazing. My main reaction to finding out about the context length of eight input frames and six pass frames as their default is why not 60? Why not 600? In text language models, we're very used to severely extending context windows. And what does that do to the memory of your model?[00:47:35] Nikhila Ravi: So I think maybe one, one thing that's different is that the object in video, it is challenging.[00:47:41] Nikhila Ravi: Objects can, you know, change in appearance. There's different lighting conditions. They can deform, but I think a difference to language models is probably the amount of context that you need is significantly less than maintaining a long multi time conversation. And so, you know, coupling this. Short term spatial memory with this, like, longer term object pointers we found was enough.[00:48:03] Nikhila Ravi: So, I think that's probably one difference between vision models and LLMs.[00:48:09] Object Tracking[00:48:09] Joseph Nelson: I think so. If one wanted to be really precise with how literature refers to object re identification, object re identification is not only what SAM does for identifying that an object is similar across frames, It's also assigning a unique ID.[00:48:25] Joseph Nelson: How do you think about models keeping track of occurrences of objects in addition to seeing that the same looking thing is present in multiple places?[00:48:37] Nikhila Ravi: Yeah, it's a good question. I think, you know, SAM2 definitely isn't perfect and there's many limitations that, you know, we'd love to see. People in the community help us address, but one definitely challenging case is where there are multiple similar looking objects, especially if that's like a crowded scene with multiple similar looking objects, keeping track of the target object is a challenge.[00:49:03] Nikhila Ravi: That's still something that I don't know if we've solved perfectly, but again, the ability to provide refinement clicks. That's one way to sort of circumvent that problem. In most cases, when there's lots of similar looking objects, if you add enough refinement clicks, you can get the perfect track throughout the video.[00:49:22] Nikhila Ravi: So definitely that's one way to, to solve that problem. You know, we could have better motion estimation. We could do other things in the model to be able to disambiguate similar looking objects more effectively.[00:49:35] swyx: I'm just interested in leaving breadcrumbs for other researchers, anyone interested in this kind of architecture.[00:49:41] swyx: Like, are there papers that you would refer people to that are influential in your thinking or, you know, have, have other interesting alternative approaches?[00:49:49] Nikhila Ravi: I think there's other ways in which you can do tracking and video. You might not even need the full mask. I think that's it. Some other works that just track like points on objects.[00:49:59] Nikhila Ravi: It really, really depends on what your application is. Like if you don't care about the entire mask, you could just track a bounding box. You could just track a point on an object. And so having the high fidelity mask might not actually be necessary for certain use cases. From that perspective, you might not need the full capabilities.[00:50:19] Nikhila Ravi: of SAM or SAM2. There's many different approaches to tracking, I think I would encourage people to think about like what actually they need for their use case and then try to find something that that fits versus, yeah, maybe SAM2 is too much, you know, maybe you don't even need the full mask.[00:50:37] swyx: Makes total sense, but you have solved the problem that you set out to solve, which is no mean feat, which is something that we're still appreciating even today.[00:50:44] The Future of FAIR[00:50:44] swyx: If there are no further questions, I would just transition to sort of forward looking, future looking stuff. Joseph already hinted at, like, you know, our interest in SAM and the future of SAM, and obviously you're the best person to ask about that. I'm also interested in, like, How should external people think about FAIR, you know, like there's this stuff going on, this llama, this chameleon, this voice box, this image bind, like, how is, how are things organized?[00:51:09] swyx: And, you know, where are things trending?[00:51:11] Nikhila Ravi: Yeah, so in FAIR, we, you know, we have a number of different research areas. I work in an area called perception. So we built vision systems that solve basically, Look at all the fundamental problems in Compute Division. Can we build a step change in all of these different capabilities?[00:51:29] Nikhila Ravi: SAM was one example. SAM2 is another example. There are tons of other problems in Compute Division where we've made a lot of progress, but can we really say that they're solved? And so that's really the area in which I work on. And then there's a number of other research areas in language and in embodied AI.[00:51:49] Nikhila Ravi: And more efficient models and various other topics. So fair in general is still very much pushing the boundaries on solving these foundational problems across different domains. Well,[00:52:07] swyx: fair enough, maybe just outside of fair, just the future of computer vision, right?[00:52:10] CVPR, Trends in Vision[00:52:10] swyx: Like you are very involved in the community. What's the talk of the town at CVPR? Both of you went, who's doing the most interesting work? It's a question for both of you.[00:52:19] Joseph Nelson: I think the trends we're seeing towards more zero shot capability for common examples will accelerate. I think Mutu modality, meaning using, you know, images in tandem with text for richer understanding or images and video in tandem with audio and other mixed media will be a continued acceleration trend.[00:52:43] Joseph Nelson: The way I kind of see the field continuing to progress, the problem statement of computer vision is making sense of visual input. And I think about the world as the things that need to be observed follow your traditional bell curve, where like things that most frequently exist out in the world are on the center of that bell curve.[00:53:05] Joseph Nelson: And then there's things that are less frequently occurring that are in those long tails. For example, you know, as back as like 2014, you have the Cocoa data set, which sets out to say, Hey, can we find 80 common objects in context, like silverware and fridge and these sorts of things. And we also conceptualized the challenge of computer vision in terms of breaking it down into individual task types, because that's like the tools we had for the day.[00:53:29] Joseph Nelson: So that's why, you know, you have the origination of classification, object detection, instant segmentation. And then as you see things continue to progress. You have models and things that need to observe areas in the long tails. And so if you think of the Cocoa dataset as the center of that bell curve, I think of like the long tails, like really edge case problems.[00:53:49] Joseph Nelson: Some of our customers like Rivian, for example, only Rivian knows what the inside of like a Rivian should look like as it's assembled and put together before it makes its way to a customer and they're making custom parts. Right? So how could a model you've been trained on the things that go inside the componentry of producing a vehicle and Andreesen, What's kind of happening with computer vision is you're seeing models that generalize in the middle of the bell curve push outward faster.[00:54:17] Joseph Nelson: That's where you see the advent of like open text models or the richness of understanding of multimodal models. To allow richer understanding without perhaps any training, or maybe just using pre training and applying it to a given problem. And then, there's like, you know, kind of like the messy middle in between those two, right?[00:54:38] Joseph Nelson: So like, Akila kind of talked about examples where SAM does well out of distribution, where like, it finds an octopus, even though there wasn't octopi in the training data. I showed an example where, like, screenshots, where Sam isn't yet super great at screenshots, so maybe that's, like, in the messy middle or in the longer tails for now.[00:54:54] Joseph Nelson: But what's going to happen is there needs to be systems of validating the point of view that I think about, like, tooling to also validate that models are doing what we want them to do, adapting to datasets that we want them to adapt to. And so there's a lot of things on a forward looking basis that allow propelling that expansion of generalizability.[00:55:14] Joseph Nelson: That's for open text problems. That's where scaling up of training, of dataset curation, continues to play a massive role. Something that's notable, I think, about SAM2 is it's, what, 57, 000 videos? 51,[00:55:30] Nikhila Ravi: 000 videos? About 51, 000, yeah.[00:55:32] Joseph Nelson: And 100, 000 internal datasets. That's, like, not Massive, right? And the model size also isn't, you know, the largest, largest model being a couple hundred million parameters.[00:55:43] Joseph Nelson: The smallest model is 38 million parameters and can run at 45 FPS on an A100, right? Like the capabilities of, we're going to see more capable, more generalizable models. Being able to run on a higher wide array of problems with zero or multi shot capability on a faster, a faster rate. And I think the architecture innovations and things like SAM2 of memory, of increasingly like transformers making their way into division and probably blended architectures increasingly too.[00:56:15] Joseph Nelson: So my viewpoint of like on a go forward basis is we will have that bell curve of what humans can see both in the center of that curve and the long tails. And architectural changes allow richer understanding, multi and zero shot, and putting those into systems and putting those into industry and putting those into contexts that allow using them in practical and pragmatic ways.[00:56:38] Joseph Nelson: Nicola, I'd love to hear like your thought and perspective of like how you think the research trends map or don't map to that. And like maybe some of the key innovations that you saw at CVPR this year that, you know, Got you excited about the direction and maybe some promising early directions that you're thinking about researching or pushing the boundaries of further.[00:56:56] Nikhila Ravi: Yeah, I just wanted to actually reply to a couple of things that you said about so actually in video object segmentation, the number of classes. that are annotated in these, and then the size of these datasets are really small. So with SAM, it's, you know, we had a billion masks, we had 11 million images, didn't have class labels.[00:57:17] Nikhila Ravi: But even before that, there were a lot of datasets that have class labels and are annotated. With significantly more with, with like a lot of class labels, whereas in video datasets, the number of class labels are very small. So there's like YouTube VOS, which has 94 object categories, there's Mose, which has around like 30 or so object categories.[00:57:38] Nikhila Ravi: And they're usually like people, there's cars, there's dogs and cats and all these common objects, but not really, they don't really cover a very large number of object categories. And so while Sam learned this general notion of what an object is in an image. These video tracking models actually don't have that knowledge at all.[00:58:01] Nikhila Ravi: And so that's why having this data set is really important for the segment anything capability in video because if you just provide the mask as the input to an off the shelf Video object segmentation model. It might not actually be able to track that arbitrary object mask as effectively as a SAM2 model that's actually trained to track.[00:58:24] Nikhila Ravi: Any object across the entire video. So doing these sort of combining two models together to try to get a capability that will actually only get you so far and being able to actually create that the dataset to enable that anything capability, it was actually really important and we can actually see that when we do comparisons with baselines where we provide some two with the same input mask and the baseline model with the same input mask.[00:58:53] Nikhila Ravi: For example, the t shirt of a person, SAM2 can track the t shirt effectively across the entire video, whereas these baselines might actually start tracking the entire person, because that's what they're used to doing, and isolating it to just one part of the person is not something they were ever trained to do, and so those are sort of some of the limitations.

That Real Blind Tech Show
Episode 157 - In Space No One Can Hear You Poop

That Real Blind Tech Show

Play Episode Listen Later Jun 30, 2024 114:33


It's an all new That Real Blind Tech Show. Allison, Brian, Jeanine, and David Goldstein pinch hitting for Ed get together to talk some tech and other nonsense. It is a long one, but well worth the listen, as we cover a lot of interesting topics and some insanity occurs along the way.   We start off discussing the latest update to Twitter which has reportedly on iOS broke a lot of accessibility, but the good news is Spring for Twitter still works on both iOS and Mac.   Brian then discusses the CVPR conference he just spoke at in Seattle. The conference is all about computer vision and A.I.Next Brian discusses his jaunt off to L.A.    We then discuss the recent article that pooping on the Moon is a messy business. You got to love articles like these! On a brilliant Segway, it ends up Perplexity is a bullshit machine!   No surprise that a research study has proven that when it comes to hiring people with disabilities ChatGPT has been proven to be ableist. Which then leads us to discussing the blind American adult unemployment rate.   An A.I. Al Michaels will bring you daily personalized Olympic updates on Peacock.   Yelp of all places is adding more ways to search for businesses and accessibility.   Would you pay an extra five to ten bucks a month for a smarter Alexa?   We then discuss fun with google sheets. Soon in Google Chrome Mobile you will be able to call restaurants, get reviews and perform other tasks from the address bar. We then discuss what our go to browsers are on both desktop and mobile. It is rumored that you will eventually be able to make Gemini sound like your favorite YouTuber.   We then dive in to the upcoming NFB and ACB conferences. Jeanine will be attending ACB, while Brian and Vice-President David will be at the NFB Conference in Orlando.   While it is early in the iOS 18 beta cycle, we are hearing that currently iPhone Mirroring is completely inaccessible. Despite reports, no, apple and Meta are not considering an A.I. partnership. And you will not want to miss the first ever and probably last ever version of That Real Blind Tech Show trivia.   We then talk about the Meta Ray Bans which almost all of us are rocking now, and of course someone got a new laptop, I wonder who?           And it's a quick What's Pissing Off Brian Now, and more of Watcha Streaming, Watcha Reading.   To contact That Real Blind Tech Show, you can email us at ThatRealBlindTechShow@gmail.com, join our Facebook Group That Real Blind Tehh Show, join us on the Twitter @BlindTechShow , or leave us an old school phone message at 929-367-1005. 

The Artificial Intelligence Podcast
NVIDIA Unveils Cutting Edge Visual AI Breakthroughs at CVPR Conference

The Artificial Intelligence Podcast

Play Episode Listen Later Jun 17, 2024 3:45


NVIDIA is showcasing its latest advancements in visual artificial intelligence at the Computer Vision and Pattern Recognition conference in Seattle, featuring new models and techniques in areas such as custom image generation, 3D scene editing, and autonomous vehicle perception. The company's researchers are presenting over 50 projects, including two finalists for Best Paper Awards and a winning entry in the Autonomous Grand Challenge. Notable breakthroughs include a technique to rapidly customize diffusion models and a foundation model that can instantly understand and track 3D object poses in videos. --- Send in a voice message: https://podcasters.spotify.com/pod/show/tonyphoang/message

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Gen AI at the Edge: Qualcomm AI Research at CVPR 2024 with Fatih Porikli - #688

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 10, 2024 70:41


Today we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research. In our conversation, we covered several of the Qualcomm team's 16 accepted main track and workshop papers at this year's CVPR conference. The papers span a variety of generative AI and traditional computer vision topics, with an emphasis on increased training and inference efficiency for mobile and edge deployment. We explore efficient diffusion models for text-to-image generation, grounded reasoning in videos using language models, real-time on-device 360° image generation for video portrait relighting, unique video-language model for situated interactions like fitness coaching, and visual reasoning model and benchmark for interpreting complex mathematical plots, and more! We also touched on several of the demos the team will be presenting at the conference, including multi-modal vision-language models (LLaVA) and parameter-efficient fine tuning (LoRA) on mobile phones. The complete show notes for this episode can be found at https://twimlai.com/go/688.

WGAN Forum Podcast
298. Six Reasons Why Matterport Will Launch Project Genesis Monday (3 June 2024) | #mttr #csgp

WGAN Forum Podcast

Play Episode Listen Later Jun 2, 2024 13:09


Six Reasons Why Matterport Will Launch Project Genesis Monday (3 June 2024) Commentary and analysis by Dan Smigrod Founder and Managing Editor We Get Around Network Forum and WGAN-TV Podcast Hi All, Matterport will likely announce on Monday, 3 June 2024 that Project Genesis is now commercially available to all. If that's true, what I wrote about a year ago (and reprinted below for your convenience), becomes relevant and timely: ✓ Matterport Genesis Generative AI Magic: Will it Make Competitors Irrelevant? "Project Genesis" was first announced 14 June 2023 that it was in development, would include these super-powerful features: ✓ decluttering of Matterport digital twins (residential and commercial spaces) ✓ remove objects ✓ virtual staging of Matterport spaces using AI (easy prompts) (resident and commercial spaces) ✓ space planning of commercial spaces "How many people can fit in an open floor plan design?" (Project Genesis likely contributed to CoStar's plans to acquire Matterport by the end of 2024). Matterport CEO RJ Pittman in this video released on 14 February 2024 said: And when it comes to remodeling, we are especially excited to talk about the future. Matterport has become synonymous with digitizing and documenting physical spaces exactly as they exist today. But later in 2024, Matterport will enable you to reimagine what any space could become. We call it Project Genesis, and it incorporates Matterport's decade of machine learning and AI experience with breakthrough generative AI to bring new design and furnishing ideas to life at the click of a button. First, just imagine being able to instantly defurnish an existing space to see it as a blank canvas with all the dimensions and spatial data in place, a photorealistic 3D as-built digital twin. Or maybe you'd like to be more selective and remove unwanted objects individually, perhaps to clear out the clutter. Ultimately, spatial visualization is Matterport's greatest strength, which is why we're so focused on creating breakthrough tools to let you visualize your space with different interior design styles and furnishings automatically in true 3D. With our 2024 Winter Release, we're thrilled to unveil the next generation of digital twins, packed with fresh insights, automations, integrations and plugins, with more on the way in our next release coming in the second half of the year [2024]. Here are the six (6) "tea leaves" that I am reading to make this prediction: 1. Shopping Cart: Matterport is updating its shopping cart now (noon- 3 pm ET) today (Sunday, 2 June 2024). Many major announcements by Matterport have been preceded by "Scheduled Maintenance) 2. Quarterly Release Cycle: Matterport signaled that it wanted to be on a quarterly release cycle, when it announced its Winter 2024 Release on 15 February 2024. Four months from then will be June 15, 2024: just 12 days from today. (Matterport in this 15 February 2024 video says "first in a series" ... "New features coming to market" ... "more in our next release coming in the second half of 2024." 3. No Matterport News Since 15 May 2024: Matterport tends to issue a media release weekly (whether it truly has news or rehashes in a media release something that was previously announced (in YouTube videos, blog post, case study, etc.) Almost every time that Matterport has announced something big, there have been no other media releases in the previous two weeks. I could imagine that Matterport has its marketing communications team focused on the news announcement for at least two weeks leading up to major announcements. 4. Matterport Blog Post Cadence Slowed: In recent weeks, Matterport blog posts have almost been weekly. The last Matterport Blog post featuring news was on 24 May 2024. Almost every time that Matterport has announced something big, there was no Matterport blog post in the preceding week. Again, I could imagine that Matterport has its marketing communications team focused on creating a blog post for the big news. 5. Matterport Abstract Published: In early May 2024 (Thank you @Fathom3D), Matterport published: Matterport Abstract: Automatic Defurnishing of Indoor 360 Panoramas on its GitHub. While the audience for the Matterport Abstract are other developers in the computer vision space, I could imagine that Matterport would have held off this publication until after its big announcement the week of June 3rd, EXCEPT the Matterport Abstract has been accepted for two Workshops that needed to begin promoting their 18 June 2024 and 17-21 June 2024 events respectively: ✓ Workshop on Generative Models for Computer Vision at CVPR 2024 (18 June 2024) ✓ Workshop on Computer Vision in the Built Environment at CVPR 2024 [17-21 June 2024) 6. Previously Announced News (Coming Soon) is Announced Again (When Available): Matterport has frequently announced coming soon features (beta release) and then announced it again when the features are commercially available. It's been a year since Matterport first announced Project Genesis on 14 June 2023. Matterport announces major news on Mondays (such as CoStar's plans to acquire Matterport). === If I am right about Project Genesis launching Monday, June 3, 2024), then what I wrote on 14 June 2023 in the We Get Around Network Forum (and reprinted below for your convenience), is timely and relevant): --- Continued in the We Get Around Network Forum: https://www.wegetaroundnetwork.com/topic/19908/page/1/6-reasons-why-matterport-will-launch-project-genesis-monday-6-june-2024/

From KNOW-HOW to WOW
DEEP DIVE: HOW GENERATIVE AI CAN HELP DRIVING A CAR

From KNOW-HOW to WOW

Play Episode Listen Later May 29, 2024 9:38


Generative AI or large vision-language models can improve advanced driver assistance systems (ADAS) in various ways. Dr. Liu Ren of Bosch Research in Sunnyvale, California, explains how foundation models can help label training data and analyze a bird's eye view representation of a car's various data sources to improve the functional performance of ADAS. Papers: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Henrique Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren. "USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation", at CVPR 2024. Chenbin Pan, Burhan Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G. Allievi, Senem Velipasalar, Liu Ren. "VLP: Vision Language Planning for Autonomous Driving", at CVPR 2024. Chenbin Pan, Burhan Yaman, Senem Velipasalar, Liu Ren. "CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow", at CVPR 2024.

From KNOW-HOW to WOW
HOW GENERATIVE AI CAN HELP DRIVING A CAR

From KNOW-HOW to WOW

Play Episode Listen Later May 15, 2024 28:30


Prompt battle! We've all tried giving generative AI chatbots instructions that yield useful results. Recently, GenAI has rapidly expanded across various industries. But can it help drive a car? Dr. Liu Ren, Vice President and Chief Scientist for scalable and assistive AI at Bosch Research, is confident it can. He and his team are at the forefront of integrating foundation models such as large language models or vision-language models with advanced driver assistance systems (ADAS). In this episode, Liu explains to our hosts, Shuko and Geoff, how the extensive knowledge contained in foundation models can enhance the perception, prediction, and planning capabilities of ADAS. What's even more impressive is that Bosch Research's approach works despite the constrained resources typical in vehicle environments. As Liu shares his cutting-edge research, Shuko and Geoff go head-to-head in a challenge: Who can craft the cleverest prompts for GenAI? Credits: 1. Chenbin Pan, Burhan Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G Allievi, Senem Velipasalar, Liu Ren. “VLP: Vision Language Planning for Autonomous Driving”, The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024). 2. Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Henrique Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren. “USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation” , The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024). 3.Chenbin Pan, Burhan Yaman, Senem Velipasalar, Liu Ren. “CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow”, The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024). More Bosch podcasts: From Know-how to Wow with Liu Ren, all about “Visual Analytics”: https://www.youtube.com/watch?v=CmR97fW_B3I Beyond Bosch: https://podtail.com/de/podcast/beyond-bosch/

BrainX Talks
Conversation with Vivek Natarajan

BrainX Talks

Play Episode Listen Later Apr 30, 2024 49:11


Welcome to BrainX AI in Medicine Podcast series, where we bring in leaders within fields of AI and Medicine to discuss their cutting-edge research, explore boundaries of current knowledge, and provide insightful commentary on how to effectively lead the AI revolution in Medicine. Today's guest is Vivek Natarajan, who is a Research Scientist at Google leading research at the intersection of large language models (LLMs) and biomedicine. In particular, Vivek is the lead researcher behind Med-PaLM and Med-PaLM 2, which were the first AI systems to obtain passing and expert level scores on US Medical License exam questions respectively. Med-PaLM was recently published in Nature and has been featured in The Scientific American, Wall Street Journal, The Economist, among many others. More recently, Vivek also led the development of Med-PaLM M, the first demonstration of a generalist biomedical AI system and AMIE, a research AI system, which surpassed Primary Care Physicians on multiple axes pertaining to diagnostic dialogue in an randomized study conducted in the style of a virtual Objective Structured Clinical Examination (OSCE). Over the years, Vivek's research has been published in well-regarded journals and conferences like Nature, Nature Medicine, Nature Biomedical Engineering, JMLR, CVPR, and NeurIPS. It also forms the basis for several regulated medical device products under clinical trials at Google, including the NHS AI award winning breast cancer detection system Mammo Reader and the skin condition classification system DermAssist.

TIANYU2FM — 对谈未知领域
E089. AI狂暴发展的尽头在哪里?对谈天才AI科学家 ft. 林之秋

TIANYU2FM — 对谈未知领域

Play Episode Listen Later Dec 20, 2023 105:30


AI的发展在让我们震撼的同时,也带来一种挥之不去的失控感。你是否也会猛然间意识到:ChatGPT的发布才刚刚过去一年?在过去的一年里,人类似乎以自己从未想过的速度迈入了AI的时代。琳琅满目的AI工具已经融入了许多人的工作流。与此同时,AI的发展却丝毫没有放慢脚步的迹象,每过几天就会有新的惊人进展。AI的发展在让我们震撼的同时,也带来一种挥之不去的失控感。AI狂暴的发展会走向哪里?又会带来哪些问题?本期节目,我们再次邀请到了AI科学家林之秋,和他一起聊聊AI爆火一年后,他眼中AI的当下与未来 林之秋卡内基梅隆大学计算机科学博士生。他17岁就考入康奈尔大学,开启了他的开挂之路。他用两年时间就修完计算机和数学双专业,并从大二开始为康奈尔科技学院编写硕士生的预修课程,给博士生的试卷打分。2020年,还是本科生的林之秋就获得了CVPR最佳论文的提名。他是唯一当时还是本科生的获提名者。这也成了当时学术圈的一个现象级的新闻。三年后的今天,他的博士之路也已经接近尾声。目前,他正在meta实习并训练自己的模型,积累了科研和AI商业化应用两方面经验。在本期的内容里,我们从传言中OpenAI最近宫斗背后的超强人工智能**Q*** 与 OpenAI首席科学家Ilya用来控制AI风险的**超级对齐**项目聊起。聊到了OpenAI似乎在不知不觉中正在成为恶龙,而meta却在成为屠龙勇者。我们还讨论了AI为什么发展的如此迅速,未来又将走向何方?AI与自由意志?林之秋自己如何使用AI工具?以及AI的未来对普通人可能的影响等等话题!那就请你扶稳坐好,来和我们一起加入这场有一点令人汗毛竖立的对话吧!节目中提到的GPTs索引:allgpts.co结语天域:在一年前聊AI的那期节目的最后,我提到自己会去试一试ChatGPT。而一年后的现在,我的主业的两家公司已经离不开ChatGPT的帮助了。这种巨大的转型发生的速度之快也让我期待明年的此时我们又会和AI形成怎样的关系。本期节目也是TIANYU2FM在2023年的最后一期内容,天宇和天域在这里提前祝你新年快乐!希望2024年对你来说是美好的一年。本期节目制作John(制片人)在洋(节目剪辑)Alan(节目运营)TIANYU2FM的理念:每期对谈有价值的声音我们是天宇和天域,是挚友,也是一起求知的伙伴。这是一档为了开拓眼界,走出自我局限而设立的播客,我们通过与人的对谈来与未知的领域及知识互动。主持人简介天宇 | 大白(声调偏低):从事中日流行文化与媒介研究(文章见于澎湃新闻私家历史、网易新闻历史频道等)天域 | 杰激(声调偏高):服装电商公司创始人、UnDeR20合伙人视频播客TIANYU2FM视频播客现已上线!请在B站 / Youtube 搜索TIANYU2FM找到我们!(比音频版更新晚一周左右)TIANYU2FM听友群欢迎你加入TIANYU2FM听友群,和我们一起直接聊天。请在WX搜索同名ID:TIANYU2FM,并且备注:来聊天。记得不是公众号,而是添加好友。

Proactive - Interviews for investors
HANetf launches Sprott ESG Copper Miners UCITS ETF

Proactive - Interviews for investors

Play Episode Listen Later Dec 11, 2023 6:53


HANetf Head of Research Tom Bailey visits the Proactive London studio to speak with Thomas Warner about the Sprott ESG Copper Miners UCITS ETF. Bailey explains that the new ETF offers investors exposure to copper miners in an environmentally and socially responsible manner. The ETF is significant due to copper's growing importance in the global economy, especially in renewable energy and decarbonisation efforts. For instance, electric vehicles use over twice the amount of copper compared to conventional cars. Additionally, renewable energy sources like wind and solar power require more copper than traditional coal or gas-fired power plants. Bailey highlights the expected surge in copper demand for green uses and says the ESG aspect of the ETF ensures investment in copper miners that avoid activities like coal production, aligning with investor preferences for sustainable investments. He describes the Sprott ESG Copper Miners ETF as more than just a sustainable investment option, saying that it represents a growth opportunity in the mining sector, given the anticipated rise in copper demand linked to global economic trends. It is listed on the London Stock Exchange under the ticker CVPR and will also be available in Milan and Frankfurt. #TomBailey #HANetf #ETFs #CopperMining #ESGInvesting #RenewableEnergy #ElectricVehicles #CopperDemand #GlobalEconomy #Decarbonization #EnergyTransition #SprottESGCopper #UCITSETF #LondonStockExchange #MiningSector #InvestmentTrends #GreenInvesting #SustainableFinance #CleanEnergy #FinanceInterview #EconomicGrowth #InvestorEducation #ProactiveInvestors #copper #invest #investing #investment #investor #stockmarket #stocks #stock #stockmarketnews

TIANYU2FM — 对谈未知领域
E053.5 我醉酒醒来,发现自己身处百公里外 ft. 天域

TIANYU2FM — 对谈未知领域

Play Episode Listen Later Jul 20, 2023 30:54


本期内容是天域一度考虑投稿到故事FM的经历,故事梗概一如标题,其戏剧性程度或许不难想象。这个故事大概发生在2017年,正好借着本期正片聊到了蹦迪与夜生活,把这个故事讲出来。如果你是TIANYU2FM的超级元老听众的话,可能你在很早期的节目听我们挖过这个故事的坑,今天就终于把它填上了。天域:当时年纪还小,有些事儿欠考虑,请就当听一乐儿。我也头顶顶个锅盖求轻喷。我从学校毕业然后开始自己做小买卖之后基本上就告别了喝酒这项活动。所以如果你是近几年认识的朋友,请不要以为我不太去酒局是骗人的。要说有没有什么缺失呢,这个一定程度上是必然的。酒精确实是给人带来一定的快乐的,尤其是我这种看似外向,实际上是个内向性格的人,可以打破一些社交的边界。俗称,酒壮怂人胆。其实喝酒在我自己粗浅以及外行的认识里,基本上有两种情况。一种人呢是喝多了再醉和放飞自我之前就会想吐,吐了呢也就大概清醒了。还有一种人是不容易吐,可以一路走到第二天不记得前一天晚上发生了什么。好巧不巧,我属于后一种。要说这种体质有什么好处,那就是相对的你更容易享受到酒精的快乐。但问题就是喝不喝大,过不过线完全是你需要自己靠意志力去决定的一件事。而往往酒精摄入到一个点之后,你来不及克制就已经过线了。如果你有过喝酒比较多的经历的话,这种情况甚至是你坐着喝酒,控制得很好,觉得自己差不多可以停了。这个时候你站起来走了两步,酒精一扩散立马就挂了,都来不及收手。当然这是我自己的个人伪科学经验,所以如果你有不同的经验,请原谅我的不严谨。总之,做事之后各种合作和事情都是有流程和时限的。如果我这边不能按时完成,那就容易麻烦我的合作伙伴,何况很多事情最后是需要你去负责兜底。所以我就相对比较一刀切地选择了一种基本不喝酒,纯酒局基本不去的习惯。要说未来会不会再次调整,我也不清楚,如果我希望达成的某个目标甚至比如生活的更随性一点,那可能我会想一个新的方式和酒精共存。唠了唠叨说了不少,前段时间还和朋友开玩笑说,这个阶段可能最想要的就是每天精精神神地面对随时出现的各种问题。ok不扯了,如果对这个故事你有什么想聊的或者吐槽的,可以在评论区轻喷。我已经戴上了锅盖。下期更新预告相信对于因为TIANYU2FM的科学科技类选题入坑的朋友们来说,下期节目一定会对你胃口。我们再次请来了之前非常受欢迎的卡内基梅隆大学博士生、人工智能科学家林之秋和广东省智能科学与技术研究院“多尺度神经网络成像和操控研究组”组长、脑科学家丘志海博士进行一次对谈。如果你听过我们的第31期节目「基础科学最前沿在做什么?| 高能物理vs宇宙学」,一定不会对这个形式感到陌生。我们将会让这两位来自不同领域的科学家进行比较自由的碰撞和探讨。而我们选择这两位一起的原因是脑科学、与人工智能本就有着千丝万缕的联系。例如我们之前节目中所谈及的,AI神经网络的很多概念就受到了脑科学的启发。林之秋17岁便考入美国“常春藤”盟校康奈尔大学,他仅用两年时间就全部修完计算机和数学两个专业的本科课程,并从大二开始选修博士课程。同年,他接受计算机系邀请,以助教身份给高年级同学讲课,为康奈尔科技学院编写硕士生的预修课程。2020年,还是本科生的他就获得了CVPR(国际计算机视觉与模式识别会议)最佳论文的提名。在当年全球学者提交的6424篇论文中,仅有26篇获得最佳论文提名,林之秋是唯一一个当时还在本科阶段的获提名者。这几乎成了当时学术圈的一个现象级的新闻。丘志海广东省智能科学与技术研究院“多尺度神经网络成像和操控研究组”组长。他曾在斯坦福大学从事神经科学和生物医学博士后研究,曾获得包括神经科学、生物物理、生物医学工程、光学工程等领域权威国际会议的多个奖项,在中美拥有多项专利。并受邀在牛津大学《经颅脑调控工具书》中撰写“超声神经调控”章节。具有丰富的研究经验。在下期节目中,你会听到两位科学家从自己的研究领域出发相互发问,两位谈及了各自领域中前沿的进展与惊人发现、人类的技术爆炸趋势与未来图景、AI与脑神经科学的合作可能性等话题。如果你感兴趣的话,就关注一下我们吧。我们每周四准时更新。TIANYU2FM听友群欢迎你加入TIANYU2FM听友群,和我们一起直接聊天。请在WX搜索同名ID:TIANYU2FM,并且备注:来聊天。记得不是公众号,而是添加好友。欢迎关注我们的微博、twitter以及收藏我们的网站以看到更多日常信息以及节目更新提醒!Weibo: @tianyu2fm Twitter: @tianyu2fm Bilibili:tianyu2fm 网址: https://tianyu2.fireside.fm/TIANYU2FM提供的价值:每期对谈一个陌生行业我们是天宇和天域,是挚友,也是一起求知的伙伴。这是一档为了开拓眼界,走出自己局限而设立的播客,通过与人的对谈来试图与未知的领域和知识产生互动。主持人简介:天宇 | 大白(声调偏低):从事中日流行文化与媒介研究(文章见于澎湃新闻私家历史、网易新闻历史频道等)天域 | 杰激(声调偏高):服装电商公司创始人

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Data Augmentation and Optimized Architectures for Computer Vision with Fatih Porikli - #635

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 26, 2023 52:31


Today we kick off our coverage of the 2023 CVPR conference joined by Fatih Porikli, a Senior Director of Technology at Qualcomm. In our conversation with Fatih, we covered quite a bit of ground, touching on a total of 12 papers/demos, focusing on topics like data augmentation and optimized architectures for computer vision. We explore advances in optical flow estimation networks, cross-model, and stage knowledge distillation for efficient 3D object detection, and zero-shot learning via language models for fine-grained labeling. We also discuss generative AI advancements and computer vision optimization for running large models on edge devices. Finally, we discuss objective functions, architecture design choices for neural networks, and efficiency and accuracy improvements in AI models via the techniques introduced in the papers.

Machine Learning Podcast
#046 ML Татьяна Гайнцева. Чего не хватает для AGI, как учиться в Физтехе, чтобы доучиться и почему преподавать - это круто

Machine Learning Podcast

Play Episode Listen Later May 31, 2023 66:28


В гостях выпуска Татьяна Гайнцева - студент PhD в Queen Mary University of London, ex-AI researcher в Philips, Huawei, преподаватель в Deep Learning School, ШАД, МГУ, а также в некоторых образовательных проектах. Когда-то я начал свой путь в ML именно с лекций Татьяны, в частности, она меня научила пользоваться гугл-коллабом. Это интервью было записано в декабре 2022 года, но, на мой взгляд, не потеряло своей актуальности. Татьяна очень интересно рассказала о том как попала в сферу глубокого обучения, как вместе с сокурсниками делала полезные для ML-исследователей проекты, как участвовала в создании Школы Глубокого Обучения (DLS) и какими исследованиями занимается сейчас. Также обсудили перспективы возникновения AGI, направления в ИИ откуда могут прийти прорывы в ближайшее время и что об этом всем думают Джеффри Хинтон и Ян Лекун.Ссылки выпуска:Выпуск подкаста Deep Learning Stories с сейсмологом Артемием Новоселовым (https://music.yandex.ru/album/17951713/track/108058566)Телеграм-канал Татьяны DLStories | Нейронные сети и ИИ (https://t.me/dl_stories)Пост в телеграм-канале Татьяны со ссылками для изучения диффузии (https://t.me/dl_stories/620)Туториал по диффузии от CVPR (https://cvpr2022-tutorial-diffusion-models.github.io/)Курс Huggingface по диффузии (https://huggingface.us17.list-manage.com/subscribe?u=7f57e683fa28b51bfc493d048&id=ef963b4162)Статья Татьяны на Хабре про Inductive bias (https://habr.com/ru/articles/591779/)Буду благодарен за обратную связь!Оставляйте ваши оценки комментарии там, где можно. Например, в Apple Podcasts. Они помогут сделать подкаст лучше! Напишите что вам было понятно, что не очень, какие темы раскрыть, каких гостей пригласить, ну, и вообще в какую сторону катить этот подкаст :)Подписывайтесь на телеграм-канал "Стать специалистом по машинному обучению" (https://t.me/toBeAnMLspecialist)Мой телеграм (https://t.me/kmsint)Также со мной можно связаться по электронной почте: kms101@yandex.ruПодкаст можно найти еще на YouTube (https://www.youtube.com/channel/UCzvfXLNpB2Bbf32dc7a8oDQ?) и Яндекс.Музыке (https://music.yandex.ru/album/9781458)А еще я сделал бесплатный курс по созданию телеграм-ботов на Python и aiogram на Степике (https://stepik.org/120924). Присоединяйтесь, если хотите научиться разрабатывать телеграм-ботов!Поблагодарить меня можно добрым словом и/или донатом (https://www.tinkoff.ru/rm/kryzhanovskiy.mikhail11/NkwE718878/)

Product by Design
Sustainable Investments: Using AI and Data Analytics

Product by Design

Play Episode Listen Later May 11, 2023 43:31


Get ready for an informative and exciting episode of Product by Design, as host Kyle introduces his guest, Gopal, the CTO and head of Sust Global, a climate and data startup. In this episode, Kyle and Gopal discuss the increasing use of climate-related data and analytics in corporate reporting and risk management. They also dive into how Sust Global has developed an API and dashboarding products to help companies understand and prepare for climate risk. You'll learn about the importance of driving a shared understanding through data and how machine learning and AI can help with climate analytics. Gopal also shares valuable advice for those starting out or looking to level up in data analytics. Don't miss this insightful episode filled with expert tips on how to effectively address the challenges posed by climate change. Gopal Erinjippurath:Gopal serves as CTO and Head of Product at Sust Global, a venture focused on geospatial analytics for climate adaptation.Most recently, he led the Analytics Engineering team at Planet Labs (NYSE:PL), an integrated aerospace and data analytics company that operates history's largest commercial fleet of earth observation satellites. Planet Analytics serves a range of customers from city planning teams in governments and the World Bank to defense and intelligence functions across the world. He is known for agile engineering execution from concept to scalable high quality products. He has been an invited speaker at global industry conferences like Google Cloud Next and leading technical conferences in the machine learning space such as ICML, CVPR and NeurIPS.Links from the Show:LinkedIn: https://www.linkedin.com/in/gopalerinjippurath/ Sust Global: https://www.sustglobal.com/ From Our Partners:Future London Academy's MBA for Design Leaders: bit.ly/chiefdesignerMore by Kyle:Follow Product by Design and Kyle on TwitterFollow Product by Design and Kyle on TikTokSign up for Kyle's Product Thinking Newsletter for more updates.Kyle's writing on MediumProduct by Design on MediumLike our podcast, consider Buying Us a Coffee or supporting us on Patreon[00:00:12] Remote sensing and climate modeling for predictive analytics[00:04:42] Sports and Climate Risk for Businesses[00:09:05] Creating API and Dashboard for Specific Market Use Cases[00:13:40] Sustainability and Climate Change Risks[00:18:08] Machine learning & AI for climate change adaptation[00:22:31] Driving Shared Understanding through Data[00:26:51] The Three Dimensions of Successful Businesses[00:31:21] Challenges of Working with Earth Observation Satellites[00:35:33] Bridging gap between early adopters and mainstream users[00:40:03] Tips for Advancing in Data Analytics

TIANYU2FM — 对谈未知领域
E069.5 如何使用ChatGPT提升工作效率?ft. 林之秋

TIANYU2FM — 对谈未知领域

Play Episode Listen Later Feb 22, 2023 24:00


林之秋AI科学家,卡内基梅隆大学计算机科学博士生,出生于1998年。林之秋在各种报道中都是教科书般的天才少年。他17岁考入康奈尔大学,用两年时间就修完计算机和数学双专业,并从大二开始为康奈尔科技学院编写硕士生的预修课程,给博士生的试卷打分。2020年,作为本科生的林之秋就获得了CVPR(国际计算机视觉与模式识别会议)最佳论文的提名。他是唯一当时还在本科阶段的获提名者。这成了当时学术圈的一个现象级的新闻。本期节目是第69期「ChatGPT,我们真的打开了潘多拉盒子吗|对谈AI前沿」的彩蛋。我们和林之秋一起聊了聊:作为普通人的我们该如何在日常工作生活中使用ChatGPT提升效率,并在现场进行了一些“ChatGPT面对我们的问题会给出怎样的回答”的测试。除此之外,我们还和之秋聊了聊AIGC相关话题,例如AI绘画、AI头像生成等等技术在最近的爆发和未来可能的发展。另外,如果你也体验了ChatGPT,也请你把自己在日常生活和工作中提效的方式在评论区和大家一起分享。这次你的留言一定可以帮到别人!(至少会帮到我们!)下周四更新预告相信听了第69期节目的朋友,一定和我们一样都隐约感觉到一次新的技术革命正呼之欲出。我们觉得,继续深入了解由AI驱动的生活图景,会帮助我们建立一个有意义的认知。下期节目,我们有幸请到了拥有超过5w成员的知名技术与设计社群:Mixlab无界社区的创始人——Shadow,来一起探讨未来技术大发展可能的方向和其背后的逻辑。我们还一起从AIGC和人工智能的火爆,聊到回元宇宙以及其他未来可能会带来革命的技术。元宇宙真的凉了吗?AIGC的未来会是什么?Shadow拥有14年的技术领域从业经验,从将近10年前就启动过人工智能相关的创业项目。而他所创建的mixlab社群也受到了来自阿里、蔚来汽车,MIT到清华等知名企业与机构的各路大牛的青睐。Shadow的讨论不仅具有宏观视角,也有相当多的技术细节。如果你感兴趣这个话题的话,就关注一下我们吧,下周四我们会准时更新!TIANYU2FM听友群欢迎你加入TIANYU2FM听友群,和我们一起直接聊天。请在WX搜索同名ID:TIANYU2FM,并且备注:来聊天。记得不是公众号,而是添加好友。欢迎关注我们的微博、twitter以及收藏我们的网站以看到更多日常信息以及节目更新提醒!Weibo: @tianyu2fmTwitter: @tianyu2fmBilibili:tianyu2fm网址: https://tianyu2.fireside.fm/TIANYU2FM提供的价值:每期对谈一个陌生行业我们是天宇和天域,是挚友,也是一起求知的伙伴。这是一档为了开拓眼界,走出自己局限而设立的播客,通过与人的对谈来试图与未知的领域和知识产生互动。本期节目制作:在洋(节目剪辑)Alan(节目运营)主持人简介:天宇 | 大白(声调偏低):从事中日流行文化与媒介研究(文章见于澎湃新闻私家历史、网易新闻历史频道等)天域 | 杰激(声调偏高):服装电商公司创始人

Naruhodo
Naruhodo #354 - Consumir vídeos e podcasts em velocidade 2x atrapalha o aprendizado?

Naruhodo

Play Episode Listen Later Oct 3, 2022 58:41


Tem gente que tem o costume a assistir a vídeos ou ouvir áudios com a velocidade de reprodução acelerada.Mas isso melhora ou piora o entendimento do conteúdo? E a retenção? E o aprendizado? O que a ciência tem a dizer sobre isso?Confira o papo entre o leigo curioso, Ken Fujioka, e o cientista PhD, Altay de Souza.> OUÇA (58min 41s)*Naruhodo! é o podcast pra quem tem fome de aprender. Ciência, senso comum, curiosidades, desafios e muito mais. Com o leigo curioso, Ken Fujioka, e o cientista PhD, Altay de Souza.Edição: Reginaldo Cursino.http://naruhodo.b9.com.br*PARCERIA: ALURAAprofunde-se de vez: garantimos conhecimento com profundidade e diversidade, para se tornar um profissional em T - incluindo programação, front-end, data science, devops, ux & design, mobile, inovação & gestão.Navegue sua carreira: são mais de 1300 cursos e novos lançamentos toda semana, além de atualizações e melhorias constantes.Conteúdo imersivo: faça parte de uma comunidade de apaixonados por tudo que é digital. Mergulhe na comunidade Alura.Aproveite o desconto para ouvintes Naruhodo no link:https://bit.ly/naruhodo_alura*REFERÊNCIASQuasi Real-Time Summarization for Consumer Videoshttps://openaccess.thecvf.com/content_cvpr_2014/html/Zhao_Quasi_Real-Time_Summarization_2014_CVPR_paper.htmlLearning in double time: The effect of lecture video speed on immediate and delayed comprehensionhttps://onlinelibrary.wiley.com/doi/abs/10.1002/acp.3899?casa_token=s0xa0adMeVwAAAAA:65s3KRUnlx52_kaQ15wN7epRsMWjpDUkP2n5kFtaMvXul5HOhwq2x8o4kobpUli1l0JY0ks8kUaNsfSGEffect Analysis of High-Speed Presentations of Educational Video Utilizing Synthetic Speechhttps://jaems.jp/contents/icomej/vol13/07_Nagahama.pdfShowing face in video instruction: effects on information retention, visual attention, and affecthttps://dl.acm.org/doi/10.1145/2556288.2557207Speeda: adaptive speed-up for lecture videoshttps://dl.acm.org/doi/abs/10.1145/2658779.2658794?casa_token=PKM32C2aWAcAAAAA:_VsQczOv4QWSCUEozmblTucBS06Mu5TMA36deCH2-agrAxhvYuocfxUg8rFxeO_cj_w3ccSmOOWTyAVideo Playback Speed Influence on Learning Effect From the Perspective of Personalized Adaptive Learning: A Study Based on Cognitive Load Theoryhttps://www.frontiersin.org/articles/10.3389/fpsyg.2022.839982/fullVideo Speed and Interruptions in an Online Learning Environmenthttps://www.proquest.com/openview/e8cd29a4fdf523fe239b3b4f090e4f47/1?pq-origsite=gscholar&cbl=18750&diss=ySpeed Watching, Efficiency, and the New Temporalities of Digital Spectatorshiphttps://bit.ly/3DrXD3hGoing beyond the spacing effect: Does it matter how time on a task is distributed?https://journals.sagepub.com/doi/full/10.1177/17470218221113933Playback-centric visualisations of video usage using weighted interactions to guide where to watch in an educational contexthttps://arxiv.org/abs/2207.12903Naruhodo #205 - Powerpoint é útil para a aprendizagem?https://www.youtube.com/watch?v=IFej_pgh9S8&ab_channel=Cient%C3%ADstica%26PodcastNaruhodoNaruhodo #97 - Alunos que se sentam no fundão aprendem menos?https://www.youtube.com/watch?v=3o0BeabUJeE&ab_channel=Cient%C3%ADstica%26PodcastNaruhodoNaruhodo #333 - Quais as consequências do excesso de reuniões virtuais?https://www.youtube.com/watch?v=XMQdUfCWKEk&ab_channel=Cient%C3%ADstica%26PodcastNaruhodoNaruhodo #204 - Tudo que percebemos está no passado?https://www.youtube.com/watch?v=GhUx4FCuTZk&t=1s&ab_channel=Cient%C3%ADstica%26PodcastNaruhodo*APOIE O NARUHODO PELA PLATAFORMA ORELO!Um aviso importantíssimo: o podcast Naruhodo agora está no Orelo: https://bit.ly/naruhodo-no-oreloE é por meio dessa plataforma de apoio aos criadores de conteúdo que você ajuda o Naruhodo a se manter no ar.Você escolhe um valor de contribuição mensal e tem acesso a conteúdos exclusivos, conteúdos antecipados e vantagens especiais.Além disso, você pode ter acesso ao nosso grupo fechado no Telegram, e conversar comigo, com o Altay e com outros apoiadores.E não é só isso: toda vez que você ouvir ou fizer download de um episódio pelo Orelo, vai também estar pingando uns trocadinhos para o nosso projeto.Então, baixe agora mesmo o app Orelo no endereço Orelo.CC ou na sua loja de aplicativos e ajude a fortalecer o conhecimento científico.https://bit.ly/naruhodo-no-orelo

Argmax
11: CVPR Workshop on Autonomous Driving Keynote by Ashok Elluswamy, a Tesla engineer

Argmax

Play Episode Listen Later Sep 30, 2022 48:51


In this episode we discuss this video: https://youtu.be/jPCV4GKX9DwHow Tesla approaches collision detection with novel methods.

Dave Lee on Investing
MONSTER Tesla FSD Beta 10.69 update, Starlink coming to Tesla cars w/ James Douma #32 (Ep. 631)

Dave Lee on Investing

Play Episode Listen Later Aug 26, 2022 91:07


James Douma and I go over the latest Tesla FSD Beta 10.69 update and improvements, Tesla stock split and the exciting Starlink + Tmobile announcement. Timestamps: 00:00 - Introduction 00:50 - Tesla stock split 01:57 - Starlink + Tmobile announcements (direct phone-to-satellite connection) 13:20 - Starlink coming to Tesla cars 15:55 - Chuck Cook's unprotected left turns 29:47 - Ashok Elluswamy (head of Autopilot) presentation at CVPR'22 about Occupancy Networks, Implicit Collision Network 51:33 - Tesla FSD Beta 10.69 release notes 1:29:25 - Conclustion Ashok Elluswamy keynote at CVPR '22, Occupancy Networks, https://www.youtube.com/watch?v=jPCV4GKX9Dw Tesla FSD Beta 10.69 release notes: https://www.notateslaapp.com/software-updates/version/2022.16.3.10/release-notes James Douma on Twitter, https://twitter.com/jamesdouma James Douma on AI playlist, https://www.youtube.com/playlist?list=PLfibpgBinf9R7KIedEU3y-YjrA63LSKHX Social

Dave Lee on Investing
MONSTER Tesla FSD Beta 10.69 update, Starlink coming to Tesla cars w/ James Douma #32 (Ep. 631)

Dave Lee on Investing

Play Episode Listen Later Aug 26, 2022 91:07


James Douma and I go over the latest Tesla FSD Beta 10.69 update and improvements, Tesla stock split and the exciting Starlink + Tmobile announcement. Timestamps: 00:00 - Introduction 00:50 - Tesla stock split 01:57 - Starlink + Tmobile announcements (direct phone-to-satellite connection) 13:20 - Starlink coming to Tesla cars 15:55 - Chuck Cook's unprotected left turns 29:47 - Ashok Elluswamy (head of Autopilot) presentation at CVPR'22 about Occupancy Networks, Implicit Collision Network 51:33 - Tesla FSD Beta 10.69 release notes 1:29:25 - Conclustion Ashok Elluswamy keynote at CVPR '22, Occupancy Networks, https://www.youtube.com/watch?v=jPCV4GKX9Dw Tesla FSD Beta 10.69 release notes: https://www.notateslaapp.com/software-updates/version/2022.16.3.10/release-notes James Douma on Twitter, https://twitter.com/jamesdouma James Douma on AI playlist, https://www.youtube.com/playlist?list=PLfibpgBinf9R7KIedEU3y-YjrA63LSKHX Social

Underrated ML
Interestingness predictions and getting to grips with data privacy

Underrated ML

Play Episode Listen Later Jul 26, 2022 68:52


This week we are joined by Naila Murray. Naila obtained a B.Sc. in Electrical Engineering from Princeton University in 2007. In 2012, she received her PhD from the Universitat Autonoma de Barcelona, in affiliation with the Computer Vision Center. She joined NAVER LABS Europe (then Xerox Research Centre Europe) in January 2013, working on topics including fine-grained visual categorization, image retrieval, and visual attention. From 2015 to 2019 she led the computer vision team at NLE. She currently serves as NLE's director of science. She serves/served as area chair for ICLR 2018, ICCV 2019, ICLR 2019, CVPR 2020, ECCV 2020, and programme chair for ICLR 2021. Her research interests include representation learning and multi-modal search.We discuss using sparse pairwise comparisons to learn a ranking function that is robust to outliers. We also take a look at using generative models in order to utilise once inaccessible datasets.Underrated ML Twitter: https://twitter.com/underrated_mlNaila Murray Twitter: https://twitter.com/NailaMurrayPlease let us know who you thought presented the most underrated paper in the form below: https://forms.gle/97MgHvTkXgdB41TC8Links to the papers:"Interestingness Prediction by Robust Learning to Rank" [paper]"Generative Models for Effective ML on Private Decentralized datasets" - [paper]

Build a Business Success Secrets
Climate Change with Gopal Erinjippurath CTO of Sust Global | Ep. 359

Build a Business Success Secrets

Play Episode Listen Later Jul 22, 2022 58:35


Gopal and I talk about the climate and how Sust Global is using data and analytics to make climate informed decisions.  About Gopal Erinjippurath Gopal leads Sust Global's technology and product management functions. Their mission is to develop data-driven products enabling every decision to be climate-informed, so humanity can thrive on a changing planet. Most recently, Gopal led the Analytics Engineering team at Planet Labs, an aerospace and data analytics company that operates history's largest commercial fleet of earth observation satellites. He has been an invited speaker at global industry conferences like Google Cloud Next and leading technical conferences in the machine learning space such as ICML and CVPR. Gopal holds an MS in Electrical Engineering from University of Southern California and completed the Ignite Program, connecting technologists with new commercial ventures, at the Stanford Graduate School of Business. SIGN UP FOR OUR NEWSLETTER HERE!Over 17,200 listeners and counting EPISODE LINKS: Sust Global PODCAST INFO: Apple Podcasts: EDGE on Apple Podcasts Spotify: EDGE on Spotify  RSS Feed: EDGE's RSS Feed Website: EDGE Podcast SUPPORT & CONNECT EDGE's Weekly NewsletterJoin over 17,200 listeners and sign up to receive bonus content. It's free.  Sign up here >>> Please Support this Podcast by checking out our Sponsors: Mad River Botanicals 100% certified organic CBD products. The product is controlled from seed to end product by it's owners. Use code: EDGE22 to get 10% off all your orders. Shop here>>> EDGE Podcast. A top podcast for entrepreneurs!

TIANYU2FM — 对谈未知领域
E054.5 一段不设限的科学家对谈 | AI vs 脑科学 ft. 林之秋、丘志海

TIANYU2FM — 对谈未知领域

Play Episode Listen Later Jul 20, 2022 47:26


在第54期正片里,人工智能与脑神经科学领域的两位学者分享了AI与脑科学的跨学科可能。在本期里,两位主播将舞台完全让渡给嘉宾,你将听到更多两位学者的自由讨论,话题涵盖类脑芯片、学术领军人物、科学家的八卦日常、读书与科研的区别等。本期嘉宾林之秋林之秋出生于1998年。他在各种报道中都是教科书般的天才少年。他17岁考入康奈尔大学,用两年时间就修完计算机和数学双专业,并从大二开始为康奈尔科技学院编写硕士生的预修课程,给博士生的试卷打分。2020年,还是本科生的林之秋就获得了CVPR(国际计算机视觉与模式识别会议)最佳论文的提名。他是唯一当时还在本科阶段的获提名者。这成了当时学术圈的一个现象级的新闻。丘志海丘志海博士是广东广东省智能科学与技术研究院“多尺度神经网络成像和操控研究组”组长。他曾在斯坦福大学从事神经科学和生物医学方面的博士后研究。丘志海博士曾获得包括神经科学、生物物理、生物医学工程、光学工程等领域权威国际会议的多个奖项,在中美拥有多项专利,具有丰富的研究经验。下周四更新预告长期以来,多巴胺都被认为是和快乐有关的物质。因此,打着多巴胺旗号的食物或者训练方式在当下的以“制造焦虑”为主流的消费环境中屡见不鲜。但多巴胺实际是什么?它对我们有什么样的影响?在下期节目里,我们将从另一个角度去再次窥探神经科学的世界。我们请到了牛津大学神经科学博士后、知名科普作者赵思家一起对谈。赵思家博士毕业于世界顶尖名校伦敦大学学院,很长一段时间里,在科研之余,她在知乎上进行科普,截至2022年7月,思家老师已经在知乎上拥有超46万名粉丝。同时,她还是三本科普书籍的作者。在下期内容里,我们从多巴胺、阿尔兹海默症等思家老师研究内容聊起,并深入的听她分享了科学家、科研人员学术与生活的真相,在这条路上所看到的酸甜苦辣,喜怒哀乐。从对谈中,我们更好地看到“科学家”这个光辉头衔背后个体的梦想、幸福与挣扎,从而更好地了解我们的“知识”是如何被生产的。同时也供还在求学中的朋友们,或是走上科研道路的研究者参考。如果你感兴趣的话,就关注一下我们吧。下周四我们会准时更新。TIANYU2FM听友群欢迎你加入TIANYU2FM听友群,和我们一起直接聊天。请在WX搜索同名ID:TIANYU2FM,并且备注:来聊天。记得不是公众号,而是添加好友。TIANYU2FM提供的价值:每期对谈一个陌生行业我们是天宇和天域,是挚友,也是一起求知的伙伴。这是一档为了开拓眼界,走出自己局限而设立的播客,通过与人的对谈来试图与未知的领域和知识产生互动。主持人简介:天宇 | 大白(声调偏低):从事中日流行文化与媒介研究(文章见于澎湃新闻私家历史、网易新闻历史频道等)天域 | 杰激(声调偏高):服装电商公司创始人

TIANYU2FM — 对谈未知领域
E053.5 我醉酒醒来,发现自己身处百公里外 ft. 天域

TIANYU2FM — 对谈未知领域

Play Episode Listen Later Jul 6, 2022 30:54


本期内容是天域一度考虑投稿到故事FM的经历,故事梗概一如标题,其戏剧性程度或许不难想象。这个故事大概发生在2017年,正好借着本期正片聊到了蹦迪与夜生活,把这个故事讲出来。如果你是TIANYU2FM的超级元老听众的话,可能你在很早期的节目听我们挖过这个故事的坑,今天就终于把它填上了。天域:当时年纪还小,有些事儿欠考虑,请就当听一乐儿。我也头顶顶个锅盖求轻喷。我从学校毕业然后开始自己做小买卖之后基本上就告别了喝酒这项活动。所以如果你是近几年认识的朋友,请不要以为我不太去酒局是骗人的。要说有没有什么缺失呢,这个一定程度上是必然的。酒精确实是给人带来一定的快乐的,尤其是我这种看似外向,实际上是个内向性格的人,可以打破一些社交的边界。俗称,酒壮怂人胆。其实喝酒在我自己粗浅以及外行的认识里,基本上有两种情况。一种人呢是喝多了再醉和放飞自我之前就会想吐,吐了呢也就大概清醒了。还有一种人是不容易吐,可以一路走到第二天不记得前一天晚上发生了什么。好巧不巧,我属于后一种。要说这种体质有什么好处,那就是相对的你更容易享受到酒精的快乐。但问题就是喝不喝大,过不过线完全是你需要自己靠意志力去决定的一件事。而往往酒精摄入到一个点之后,你来不及克制就已经过线了。如果你有过喝酒比较多的经历的话,这种情况甚至是你坐着喝酒,控制得很好,觉得自己差不多可以停了。这个时候你站起来走了两步,酒精一扩散立马就挂了,都来不及收手。当然这是我自己的个人伪科学经验,所以如果你有不同的经验,请原谅我的不严谨。总之,做事之后各种合作和事情都是有流程和时限的。如果我这边不能按时完成,那就容易麻烦我的合作伙伴,何况很多事情最后是需要你去负责兜底。所以我就相对比较一刀切地选择了一种基本不喝酒,纯酒局基本不去的习惯。要说未来会不会再次调整,我也不清楚,如果我希望达成的某个目标甚至比如生活的更随性一点,那可能我会想一个新的方式和酒精共存。唠了唠叨说了不少,前段时间还和朋友开玩笑说,这个阶段可能最想要的就是每天精精神神地面对随时出现的各种问题。ok不扯了,如果对这个故事你有什么想聊的或者吐槽的,可以在评论区轻喷。我已经戴上了锅盖。下期更新预告相信对于因为TIANYU2FM的科学科技类选题入坑的朋友们来说,下期节目一定会对你胃口。我们再次请来了之前非常受欢迎的卡内基梅隆大学博士生、人工智能科学家林之秋和广东省智能科学与技术研究院“多尺度神经网络成像和操控研究组”组长、脑科学家丘志海博士进行一次对谈。如果你听过我们的第31期节目「基础科学最前沿在做什么?| 高能物理vs宇宙学」,一定不会对这个形式感到陌生。我们将会让这两位来自不同领域的科学家进行比较自由的碰撞和探讨。而我们选择这两位一起的原因是脑科学、与人工智能本就有着千丝万缕的联系。例如我们之前节目中所谈及的,AI神经网络的很多概念就受到了脑科学的启发。林之秋17岁便考入美国“常春藤”盟校康奈尔大学,他仅用两年时间就全部修完计算机和数学两个专业的本科课程,并从大二开始选修博士课程。同年,他接受计算机系邀请,以助教身份给高年级同学讲课,为康奈尔科技学院编写硕士生的预修课程。2020年,还是本科生的他就获得了CVPR(国际计算机视觉与模式识别会议)最佳论文的提名。在当年全球学者提交的6424篇论文中,仅有26篇获得最佳论文提名,林之秋是唯一一个当时还在本科阶段的获提名者。这几乎成了当时学术圈的一个现象级的新闻。丘志海广东省智能科学与技术研究院“多尺度神经网络成像和操控研究组”组长。他曾在斯坦福大学从事神经科学和生物医学博士后研究,曾获得包括神经科学、生物物理、生物医学工程、光学工程等领域权威国际会议的多个奖项,在中美拥有多项专利。并受邀在牛津大学《经颅脑调控工具书》中撰写“超声神经调控”章节。具有丰富的研究经验。在下期节目中,你会听到两位科学家从自己的研究领域出发相互发问,两位谈及了各自领域中前沿的进展与惊人发现、人类的技术爆炸趋势与未来图景、AI与脑神经科学的合作可能性等话题。如果你感兴趣的话,就关注一下我们吧。我们每周四准时更新。TIANYU2FM听友群欢迎你加入TIANYU2FM听友群,和我们一起直接聊天。请在WX搜索同名ID:TIANYU2FM,并且备注:来聊天。记得不是公众号,而是添加好友。TIANYU2FM提供的价值:每期对谈一个陌生行业我们是天宇和天域,是挚友,也是一起求知的伙伴。这是一档为了开拓眼界,走出自己局限而设立的播客,通过与人的对谈来试图与未知的领域和知识产生互动。主持人简介:天宇 | 大白(声调偏低):从事中日流行文化与媒介研究(文章见于澎湃新闻私家历史、网易新闻历史频道等)天域 | 杰激(声调偏高):服装电商公司创始人

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
On The Path Towards Robot Vision with Aljosa Osep - #581

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 4, 2022 47:33


Today we wrap up our coverage of the 2022 CVPR conference joined by Aljosa Osep, a postdoc at the Technical University of Munich & Carnegie Mellon University. In our conversation with Aljosa, we explore his broader research interests in achieving robot vision, and his vision for what it will look like when that goal is achieved. The first paper we dig into is Text2Pos: Text-to-Point-Cloud Cross-Modal Localization, which proposes a cross-modal localization module that learns to align textual descriptions with localization cues in a coarse-to-fine manner. Next up, we explore the paper Forecasting from LiDAR via Future Object Detection, which proposes an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Finally, we discuss Aljosa's third and final paper Opening up Open-World Tracking, which proposes a new benchmark to analyze existing efforts in multi-object tracking and constructs a baseline for these tasks. The complete show notes for this episode can be found at twimlai.com/go/581

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
More Language, Less Labeling with Kate Saenko - #580

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 27, 2022 47:01


Today we continue our CVPR series joined by Kate Saenko, an associate professor at Boston University and a consulting professor for the MIT-IBM Watson AI Lab. In our conversation with Kate, we explore her research in multimodal learning, which she spoke about at the Multimodal Learning and Applications Workshop, one of a whopping 6 workshops she spoke at. We discuss the emergence of multimodal learning, the current research frontier, and Kate's thoughts on the inherent bias in LLMs and how to deal with it. We also talk through some of the challenges that come up when building out applications, including the cost of labeling, and some of the methods she's had success with. Finally, we discuss Kate's perspective on the monopolizing of computing resources for “foundational” models, and her paper Unsupervised Domain Generalization by learning a Bridge Across Domains. The complete show notes for this episode can be found at twimlai.com/go/580

Let's Talk AI
Nonsense Sentience, Condemning GPT-4chan, DeepFake bans, CVPR Plagiarism

Let's Talk AI

Play Episode Listen Later Jun 26, 2022 43:12


Our 102st episode with a summary and discussion of last week's big AI news! If you are a fan, we'd appreciate your feedback on Apple Podcast, or just via email to contact@lastweekin.ai - feel free to DM us on Twitter too! Outline: (00:00) Intro (01:05) LaMDA's Sentience is Nonsense - Here's Why (2:00) Condemning the deployment of GPT-4chan (7:25) Microsoft and Meta join Google in using AI to help run their data centers  (11:20) AI maturity: Only 12% of companies are 'AI Achievers'  (16:45) NVIDIA & UW Introduce Factory: A Set of Physics Simulation Methods and Learning Tools for Contact-Rich Robotic Assembly  (21:50) Artificial intelligence may be the only way researchers can solve the perplexing puzzle of Long COVID. It's already categorizing patients and even identifying them  (25:42) Reddit Bans ‘SFW' Deepfake Community (30:15) EU To Target Meta, Google, Twitter Over Deepfakes, Report Says  (33:28) How to copy text from more than 10 previously published papers and get accepted to CVPR 2022 (36:30) We taught an AI to impersonate Shakespeare and Oscar Wilde – here's what it revealed about sentience (38:02) DALL-E2 Creates The World's First AI-Generated Magazine Cover For Cosmopolitan (42:15) Outro

Shock Your Potential
Conscious Capitalism - Gopal Erinjippurath

Shock Your Potential

Play Episode Listen Later Jun 22, 2022 30:10


“Today more than ever, climate is part of the frontier narrative when it comes to the geopolitical landscape or the financial regulatory landscape.”Gopal Erinjippurath Climate change is now a fact of life on many fronts including geopolitically and in business. As the world becomes more aware of the impact of climate change, more opportunities come up to make a difference. It is however clear, that despite there being a great deal of information out there, many business are unsure about how to interpret and use it constructively. Gopal Erinjippurath has been working with businesses to help them understand better the link between their businesses and the climate.   Gopal serves as CTO and Head of Product at Sust Global, a venture focused on geospatial analytics for climate adaptation. Most recently, he led the Analytics Engineering team at Planet Labs (NYSE:PL), an integrated aerospace and data analytics company that operates history's largest commercial fleet of earth observation satellites. Planet Analytics serves a range of customers from city planning teams in governments and the World Bank to defense and intelligence functions across the world. He is known for agile engineering execution from concept to scalable high quality products. He has been an invited speaker at global industry conferences like Google Cloud Next and leading technical conferences in the machine learning space such as ICML, CVPR and NeurIPS. Previously Gopal managed teams working on industry-leading analytics products from early concept demonstrations to multiple customers at Captricity (Acquired by NASDAQ:SSNC) and Harvesting Inc, where he advised the CEO. Previously he led the algorithm engineering development of Dolby's (NYSE: DLB) first imaging display product, the Emmy Award winning Dolby Professional Reference Monitor. Gopal holds an MS in Electrical Engineering from University of Southern California and completed the Ignite Program, connecting technologists with new commercial ventures, at the Stanford Graduate School of Business. In today's episode, Gopal about what they do as a business, and why it is important that businesses begin to seek awareness of the impact of climate change to their financial bottom line. Listen in! Media links www.sustglobal.com https://www.linkedin.com/in/gopalerinjippurath/ I categorize myself as a scientists and primarily looking at data on geospatial data which is linked to location. We are enabling businesses to understand the impact of the changing climate, to their holdings and their financial bottom line. Today more than ever, climate is part of the frontier narrative when it comes to the geopolitical landscape or the financial regulatory landscape. The access to businesses to clean valid and trusted data around the changing climate across the world has been limited. Sust Global intends to transform that and enable the ability across the economy towards using a shared set of commonly known parameters around climate data. I started my career in multimedia, and later got inspired to pursue a career on focusing environmental data. Climate is a global phenomenon and the changing climate doesn't affect just one person in a specific way, rather it affects different people in different ways. As climate becomes mainstream in the dialogue on the geopolitical as well as the regional and the Civic scales, we begin to see the desire in the public to invest in climate consciously. When businesses need to enable the climate conscious avenues, then they seek to use climate data in new and creative ways. Pragmatic solutions that brings into account the changing climate need to also account for the state of affairs today. Being able to identify the conflicting demands of the present and the future are kind of what an entrepreneur has to figure out in this fast moving ecosystem. Commercial break There's a gap between the community that understands climate, and the community that wants to use climate but doesn't understand it. We are trying to limit that gap in the venues that we serve, but it will take a while, and so the time is now for us to get more educated and aware of what climate actually is. Previously, the big migration angle was around opportunity, and geo arbitrage, but I believe climate is introducing a third dimension. When it comes to opportunity around the emergence of these capabilities, there has been an unprecedented lot of awareness around discoveries and the use of data. There is a growing awareness and people having the sense of purpose, where they start to look at what they can do with their skills to address this problem, and benefit in the process. There is also the emergence of cross functional and interdisciplinary teams, which creates some unique opportunities. One can pick up enough of the foundation knowledge, and create a shared understanding, and then work with others in different domains to create new opportunities. In this connected world where we're all working remotely, it's a great time to pick up new domains because big business opportunities exist at the intersection of domains. Inherently product development is an iterative process and being comfortable with imperfection in the early days. Work with others who are comfortable with imperfection, so that they can be encouraging and partner with you on that journey. ………………………………………………… Do you want to be a go to expert that news reporters, anchors and media producers turn to? Are you a media professional looking for credible, reliable and timely guests? Shock Your Media Potential is here for you. Shock Your Media Potential is a one of a kind platform that connects vetted experts with news professionals around the globe. As part of the launch of the platform, CEO Michael Sherlock, along with co-host Eddie Luisi, stage manager for Good Morning America, have interviewed 25 media personalities and professionals to ask them the questions you need to know the answers to in order to become more newsworthy, pitch your story better, and get invited back again and again, and much more. Some of their guests are household names, with exceptional on-camera careers. Others are award-winning directors, producers, camera operators, audio engineers, celebrity hair and makeup professionals, and so much more. To learn more about our platform and our conference today, go to https://www.shockyourmediapotential.com  

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Optical Flow Estimation, Panoptic Segmentation, and Vision Transformers with Fatih Porikli - #579

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 20, 2022 51:17


Today we kick off our annual coverage of the CVPR conference joined by Fatih Porikli, Senior Director of Engineering at Qualcomm AI Research. In our conversation with Fatih, we explore a trio of CVPR-accepted papers, as well as a pair of upcoming workshops at the event. The first paper, Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation, presents a novel framework to integrate semantic and instance contexts for panoptic segmentation. Next up, we discuss Imposing Consistency for Optical Flow Estimation, a paper that introduces novel and effective consistency strategies for optical flow estimation. The final paper we discuss is IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, which proposes a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness, and lighting from a single image of an indoor scene. For each paper, we explore the motivations and challenges and get concrete examples to demonstrate each problem and solution presented. The complete show notes for this episode can be found at twimlai.com/go/579

With Jason Barnard...
The Importance of Behavioural Analytics (Ravi Yada and Jason Barnard)

With Jason Barnard...

Play Episode Listen Later May 30, 2022 27:16


Ravi Yada talks with Jason Barnard about the importance of behavioural analytics. Ravi Yada was born in southern India, grew up in the Midwest, and lives in the Pacific Northwest. He is an engineer, inventor, and public speaker. He currently works as a product manager on Microsoft Clarity. Ravi has worked in analytics, search, computer vision, and machine learning for nearly a decade and has developed numerous products at Microsoft, most notably Bing Visual Search. Ravi is an avid public speaker on AI products and technology, speaking at Microsoft Build, CVPR, DMEXCO, SMX, @Scale, and NRF. As a marketer, when building a website, you need to focus not on how YOU would personally use it, but how your users will - you have to walk in the user's digital shoes, as Ravi Yada puts it. That is where Behavioural Analytics comes into play! User behaviour analysis is the star of this episode. Jason Barnard and Ravi Yada discuss how understanding user behaviour on websites can help you improve user experience and that will lead to more conversions. And, as luck would have it, Microsoft Clarity is an amazing tool for analysing user behaviour. Microsoft Clarity is a behavioural analytics tool that uses click data, scroll tracking, session replays and heat maps to show how users interact with your website. Just like HotJar and MouseFlow, but it's free for any size website with any volume of traffic! Why? Ravi reveals that during the episode. As always, the show ends with passing the baton… Ravi groovily passes over to next week's brilliant guest, Jason Davis. What you'll learn from Ravi Yada 00:00 Ravi Yada and Jason Barnard00:30 Remembering Bill Slawski01:45 Ravi Yada's Brand SERP from Bing03:09 Jason Barnard's Brand SERP from Bing03:54 What is Behavioural Analytics?05:08 Understanding User Behaviour with Microsoft Clarity06:20 What can Microsoft Clarity Do?08:06 The Famous Bing Purple Ad Story!10:54 What are Dead Clicks and Rage Clicks?12:52 Excessive Scrolling: Why Users Scroll too Much on Your Website13:24 What is the Average Fold on Your Website and How Important is It?18:38 Ravi Yada's Two Types of User Behaviour Analysis22:51 Four Types of Readers Online (or maybe 5)24:16 Does Microsoft Clarity Use Machine Learning? 28:06 Passing the Baton: Ravi Yada to Jason Davis Relevant Content: Digital Marketing Conference - Behavioral Analytics: Stop guessing. Get clarity. by Ravi Yada - YouTubeGrowth Marketing Toolbox - GMT 268: What is Microsoft Clarity? | Growth Marketing ToolboxAI Journal Publication - Understand your customers with Behavioral Analytics - The AI JournalDMEXCO conference - DMEXCO 2019 Trusted Partnership Interview Series: Reprise - Microsoft Advertising This episode was recorded live on video May 24th 2022 Recorded live at Kalicube Tuesdays (Digital Marketing Livestream Event Series). Watch the video now >>

With Jason Barnard...
The Importance of Behavioural Analytics (Ravi Yada and Jason Barnard)

With Jason Barnard...

Play Episode Listen Later May 30, 2022 27:16


Ravi Yada talks with Jason Barnard about the importance of behavioural analytics. Ravi Yada was born in southern India, grew up in the Midwest, and lives in the Pacific Northwest. He is an engineer, inventor, and public speaker. He currently works as a product manager on Microsoft Clarity. Ravi has worked in analytics, search, computer vision, and machine learning for nearly a decade and has developed numerous products at Microsoft, most notably Bing Visual Search. Ravi is an avid public speaker on AI products and technology, speaking at Microsoft Build, CVPR, DMEXCO, SMX, @Scale, and NRF. As a marketer, when building a website, you need to focus not on how YOU would personally use it, but how your users will - you have to walk in the user's digital shoes, as Ravi Yada puts it. That is where Behavioural Analytics comes into play! User behaviour analysis is the star of this episode. Jason Barnard and Ravi Yada discuss how understanding user behaviour on websites can help you improve user experience and that will lead to more conversions. And, as luck would have it, Microsoft Clarity is an amazing tool for analysing user behaviour. Microsoft Clarity is a behavioural analytics tool that uses click data, scroll tracking, session replays and heat maps to show how users interact with your website. Just like HotJar and MouseFlow, but it's free for any size website with any volume of traffic! Why? Ravi reveals that during the episode. As always, the show ends with passing the baton… Ravi groovily passes over to next week's brilliant guest, Jason Davis. What you'll learn from Ravi Yada 00:00 Ravi Yada and Jason Barnard00:30 Remembering Bill Slawski01:45 Ravi Yada's Brand SERP from Bing03:09 Jason Barnard's Brand SERP from Bing03:54 What is Behavioural Analytics?05:08 Understanding User Behaviour with Microsoft Clarity06:20 What can Microsoft Clarity Do?08:06 The Famous Bing Purple Ad Story!10:54 What are Dead Clicks and Rage Clicks?12:52 Excessive Scrolling: Why Users Scroll too Much on Your Website13:24 What is the Average Fold on Your Website and How Important is It?18:38 Ravi Yada's Two Types of User Behaviour Analysis22:51 Four Types of Readers Online (or maybe 5)24:16 Does Microsoft Clarity Use Machine Learning? 28:06 Passing the Baton: Ravi Yada to Jason Davis This episode was recorded live on video May 24th 2022 Recorded live at Kalicube Tuesdays (Digital Marketing Livestream Event Series). Watch the video now >>

With Jason Barnard...
The Importance of Behavioural Analytics (Ravi Yada and Jason Barnard)

With Jason Barnard...

Play Episode Listen Later May 30, 2022


Ravi Yada talks with Jason Barnard about the importance of behavioural analytics. Ravi Yada was born in southern India, grew up in the Midwest, and lives in the Pacific Northwest. He is an engineer, inventor, and public speaker. He currently works as a product manager on Microsoft Clarity. Ravi has worked in analytics, search, computer vision, and machine learning for nearly a decade and has developed numerous products at Microsoft, most notably Bing Visual Search. Ravi is an avid public speaker on AI products and technology, speaking at Microsoft Build, CVPR, DMEXCO, SMX, @Scale, and NRF. As a marketer, when building a website, you need to focus not on how YOU would personally use it, but how your users will - you have to walk in the user's digital shoes, as Ravi Yada puts it. That is where Behavioural Analytics comes into play! User behaviour analysis is the star of this episode. Jason Barnard and Ravi Yada discuss how understanding user behaviour on websites can help you improve user experience and that will lead to more conversions. And, as luck would have it, Microsoft Clarity is an amazing tool for analysing user behaviour. Microsoft Clarity is a behavioural analytics tool that uses click data, scroll tracking, session replays and heat maps to show how users interact with your website. Just like HotJar and MouseFlow, but it's free for any size website with any volume of traffic! Why? Ravi reveals that during the episode. As always, the show ends with passing the baton… Ravi groovily passes over to next week's brilliant guest, Jason Davis. What you'll learn from Ravi Yada 00:00 Ravi Yada and Jason Barnard00:30 Remembering Bill Slawski01:45 Ravi Yada's Brand SERP from Bing03:09 Jason Barnard's Brand SERP from Bing03:54 What is Behavioural Analytics?05:08 Understanding User Behaviour with Microsoft Clarity06:20 What can Microsoft Clarity Do?08:06 The Famous Bing Purple Ad Story!10:54 What are Dead Clicks and Rage Clicks?12:52 Excessive Scrolling: Why Users Scroll too Much on Your Website13:24 What is the Average Fold on Your Website and How Important is It?18:38 Ravi Yada's Two Types of User Behaviour Analysis22:51 Four Types of Readers Online (or maybe 5)24:16 Does Microsoft Clarity Use Machine Learning? 28:06 Passing the Baton: Ravi Yada to Jason Davis Relevant Content: Digital Marketing Conference - Behavioral Analytics: Stop guessing. Get clarity. by Ravi Yada - YouTubeGrowth Marketing Toolbox - GMT 268: What is Microsoft Clarity? | Growth Marketing ToolboxAI Journal Publication - Understand your customers with Behavioral Analytics - The AI JournalDMEXCO conference - DMEXCO 2019 Trusted Partnership Interview Series: Reprise - Microsoft Advertising This episode was recorded live on video May 24th 2022 Recorded live at Kalicube Tuesdays (Digital Marketing Livestream Event Series). Watch the video now >>

TIANYU2FM — 对谈未知领域
E049. 人工智能现在发展到什么程度了?对话AI前沿 ft. 林之秋

TIANYU2FM — 对谈未知领域

Play Episode Listen Later May 4, 2022 83:54


解读大数据、人工智能、深度学习等概念、分享人工智能发展现状。人工智能在最近的十年中成为了人尽皆知的话题,也是当之无愧的热门研究领域。我们在抖音上刷到什么内容、外卖软件的排序,甚至是掏出手机拍照时呈现出的图像,全部都有人工智能的算法参与。这意味着,人工智能实际上已经在影响我们对世界的感官和认知,我们的所见所闻,所思所做,都受到人工智能应用的干预。因此,作为个体,去更多了解人工智能的现状与发展的重要性,也就不言自明了。**本期嘉宾 林之秋**卡内基梅隆大学计算机视觉博士生、人工智能科学家出生于1998年的林之秋几乎是教科书般的天才少年。他17岁便考入美国“常春藤”盟校——康奈尔大学。他仅用两年时间就修完计算机和数学两个专业的本科课程,并从大二开始选修博士课程。在学院数千名学霸中,他排名前三。2020年,还是本科生的他就获得了被称作计算机视觉奥斯卡的CVPR(国际计算机视觉与模式识别会议)最佳论文的提名。在当年全球学者提交的6424篇论文中,仅有26篇获得最佳论文提名,而之秋是唯一一个当时还在本科阶段的获提名者。这几乎成了当时学术圈的一个现象级的新闻。在本期的节目中,林之秋给我们讲解了人工智能涉及的各种概念,例如深度学习、神经网络的定义与框架,也和我们分享了人工智能在当下的研究进展和应用现状。此外,我们请之秋聊了聊他作为前沿学者对人工智能未来的看法,例如人工智能领域的危机,人工智能是否有可能产生情感或是意识等话题。之秋的分享从根本上帮我们重新梳理了原本大家经常听到的的零散概念,让我们茅塞顿开,希望你也会喜欢。**下期内容更新预告**下周四我们将更新本期节目的彩蛋。在彩蛋中,我们和林之秋聊了聊自动驾驶的问题。这也是之秋当前研究的具体内容。之秋讲解了所谓全自动驾驶离我们还有多远,以及自动驾驶目前实现的难点在哪里。如果你对这个话题感兴趣的话,就关注一下我们吧,下周四我们会准时更新。**TIANYU2FM听友群**欢迎你加入TIANYU2FM听友群,和我们一起直接聊天。请在WX搜索同名ID:TIANYU2FM,并且备注:来聊天。记得不是公众号,而是添加好友。**TIANYU2FM提供的价值:每期对谈一个陌生行业**我们是天宇和天域,是挚友,也是一起求知的伙伴。这是一档为了开拓眼界,走出自己局限而设立的播客,通过与人的对谈来试图与未知的领域和知识产生互动。主持人简介:天宇 | 大白(声调偏低):从事中日流行文化与媒介研究(文章见于澎湃新闻私家历史、网易新闻历史频道等)天域 | 杰激(声调偏高):服装电商公司创始人

TIANYU2FM — 对谈未知领域
听友们一起聊聊TIANYU2FM | 22年4月

TIANYU2FM — 对谈未知领域

Play Episode Listen Later Apr 27, 2022 69:31


TIANYU2FM创作背后的思考、由来,形式和定位的选择,以及初心。前段时间,TIANYU2FM进行了听友群的第三次连麦聊天活动。在这次的聊天里,我们一起聊到了对播客内容的理解、当前的一些经验以及反思。例如,如何看待单人播客和多人播客的区别,如何用相对可控的时间投入做出好的内容等等大家提出的问题。我们也分享了关于TIANYU2FM创作背后的思考,节目的由来,形式和定位的选择,以及初心。如果你对一档播客的幕后感兴趣的话,本期的内容应该不会让你失望。在当今的世界里,虽然信息的传播变得越发高速和扩张,百万千万级的数字对于一个网红来说显得稀松平常,但当我们和听友们身处在同一场域交流时,这给我们带来了尤为真切的责任感和幸福感。我们也都对能够和挚友一起创作这样的内容感到幸运且荣幸,拥有志同道合的朋友本身也是难能可贵的。TIANYU2FM已然成为了一栋桥梁,连接着包括天宇天域、各位嘉宾以及正在听收听这档节目的你在内的彼此。我们希望更多的社会连结能带来更多的互相理解,以及在跨领域的碰撞中收获新知。下周四更新预告下期节目是一个我们一直想做,但一直没有动手的选题,人工智能。人工智能这个话题的价值和重要性在当下不言自明,但如何在一期节目的时长里让我们对这个宏大学科建立足够的认知一直困扰着我们,我们有幸请来了优秀的人工智能学者林之秋一起对谈。林之秋是卡内基梅隆大学机器人学院的计算机视觉博士生。2020年,还是本科生的他就获得了CVPR(国际计算机视觉与模式识别会议)最佳论文的提名。这个会议被称作计算机视觉奥斯卡,在当年来自全球学者的6424篇论文中,仅有26篇获得最佳论文提名,而林之秋更是本次大会唯一一个还在本科阶段的获提名者,也是年龄最小的第一作者。这几乎成了当时学术圈的一个现象级的新闻。林之秋17岁便考入美国“常春藤”盟校——康奈尔大学,他仅用两年时间就全部修完计算机和数学两个专业的本科课程,并从大二开始选修博士课程。同年,他接受计算机系里邀请,以助教身份给高年级同学讲课,为康奈尔科技学院编写硕士生的预修课程,甚至在高阶机器学习课上给博士生的试卷打分。在学院数千名学霸中,他排名前三,并代表学院在毕业典礼上举旗。在下期的内容里,我们将会请之秋分享他对人工智能的基本概念和框架的理解,以及当今最前沿的研究动向。我们也会之秋讨论人工智能的意义和未来走向。如果你对这个话题感兴趣的话,就关注一下我们吧,下周四我们会准时更新!欢迎你加入TIANYU2FM听友群,来和我们一起聊天,并参加后续听友群活动。请在WX搜索同名ID:TIANYU2FM,并且备注:来聊天。记得不是公众号,而是添加好友。TIANYU2FM提供的价值:每期对谈一个陌生行业我们是天宇和天域,是挚友,也是一起求知的伙伴。这是一档为了开拓眼界,走出自己局限而设立的播客,通过与人的对谈来试图与未知的领域和知识产生互动。主持人简介:天宇 | 大白(声调偏低):从事中日流行文化与媒介研究(文章见于澎湃新闻私家历史、网易新闻历史频道等)天域 | 杰激(声调偏高):服装电商公司创始人

Unboxing AI: The Podcast for Computer Vision Engineers
The Pace of Progress in Academia and Industry - with Prof. Lihi Zelnik-Manor, Faculty of Electrical and Computer Engineering, Technion

Unboxing AI: The Podcast for Computer Vision Engineers

Play Episode Listen Later Mar 28, 2022 64:18


Listen in as Prof. Lihi Zelnik-Manor shares insights from a career that bridges from academia to industry. Lihi walks us through the history of the computer vision development community to the interplay between the industry and academia today. Lihi Zelnik-Manor is an Professor in the Faculty of Electrical and Computer Engineering at the Technion and former General Manager of Alibaba DAMO Israel Lab. Prof. Zelnik-Manor holds a PhD and MSc (with honors) in Computer Science from the Weizmann Institute of Science and a BSc (summa cum laude) in Mechanical Engineering from the Technion. Her main area of expertise is Computer Vision. Prof Zelnik-Manor has done extensive community contribution, serving as General Chair of CVPR 2021, Program Chair of CVPR 2016, Associate Editor at TPAMI, served multiple times as Area Chair at CVPR, ECCV and was on the award committee of ACCV'18 and CVPR'19. Looking forward she will serve as General Chair of ECCV'22 and as Program Chair of ICCV'25.

Stanford MLSys Seminar
2/24/22 #56 Fait Poms - Interactive Model Development

Stanford MLSys Seminar

Play Episode Listen Later Feb 28, 2022 55:54


Fait Poms - A vision for interactive model development: efficient machine learning by bringing domain experts in the loop Building computer vision models today is an exercise in patience--days to weeks for human annotators to label data, hours to days to train and evaluate models, weeks to months of iteration to reach a production model. Without tolerance for this timeline or access to the massive compute and human resources required, building an accurate model can be challenging if not impossible. In this talk, we discuss a vision for interactive model development with iteration cycles of minutes, not weeks. We believe the key to this is integrating the domain expert at key points in the model building cycle and leveraging supervision cues above just example-level annotation. We will discuss our recent progress toward aspects of this goal: judiciously choosing when to use the machine and when to use the domain expert for fast, low label budget model training (CVPR 2021, ICCV 2021), building confidence in model performance with low-shot validation (ICCV 2021 Oral), and some initial tools for rapidly defining correctness criteria.

Counting Sand
Machine Learning: Your Right to Explainability

Counting Sand

Play Episode Listen Later Jan 11, 2022 31:39


How do we make the next generation of machine learning models that are explainable? How do you start finding new kinds of models that might be explainable? Where do you even start thinking about that process from a research perspective?Nikos begins with a discussion on how we make decisions in general.  In the scientific world, we mostly reason through statistical or cause-and-effect type scenarios.  We can predict outcomes and train our models to produce the results we traditionally expect.He then discusses other early pioneers in this work, for example, back in the 70s, a rules engine was developed to help clinicians make diagnoses.  It turns out that humans are very complex and hard to codify.  Dr. Charles Forgy wrote his thesis on the Rete algorithm which is what modern-day rules-based engines stem from.After the AI winter period, there was the introduction of neural networks that would encode the rules.  This became an issue for explainability on why the rule was created.  The neural networks create a mathematical weighted data model evaluated against the outcome.  Without the ability to open up the network to determine why some data was weighted higher than another, has been the challenge in explaining the results we see.  There is also a concern from the European Union General Data Protection Regulation (GDPR) where a human has the right to obtain meaningful information about the logic involved, commonly interpreted as the right to an explanation.    We want to look at explainability through two factors: a local point of view and a global point of view.  The global objective is to extract a general summary that is representative of some specific data set. So we explain the whole model and not just local decisions.  The local objective is to explain a simple prediction as a single individual observation in the data. But you have a decision according to a neural network or a classifier or a regression algorithm, so the objective is to explain just a single observation.  There are five problems that present themselves in explainability:  Instability, Transparency, Adversarial Attacks, Privacy, and Analyst Perspective.For Instability, we look at heat maps as they are very sensitive to hyperparameters, meaning the way that we tuned that network.  How we adjusted the sensitivity then impacts the interpretation. Transparency becomes more difficult the more accurate machine learning is.  We call that transparency because machine learning models, neural networks, are black boxes with very high dimensionality. But what's interesting is that we can say that their prediction accuracy makes explainability inversely proportional to that.  An Adversarial Attacks example is to imagine that interpretability might enable people, or programs to manipulate the system. So if one knows that for instance, having three credit cards can increase his chance of getting a loan then they can game the system by increasing their chance of getting the loan without really increasing the probability of repaying the loan.  Privacy can impact your access to the original data especially in complex systems where boundaries can exist between other companies.  You might not have the ability to access original data.  Lastly, the Analyst Perspective. When a human gets involved to explain the system, important questions include, where to start first and how ensuring the interpretation aligns with how the model actually behaved.  There are some systems by which the ML has multi-use and the human is trying to understand the perspective of use for the result given.  These are some specific ways we have found that create the complexity and challenges in explainability with machine learning models.We continue to learn and adjust based on those learnings.  This is a very interesting and important topic that we will continue to explore. CitationsDr. Charles Forgy (1979), On The Efficient Implementation of Production Systems, Carnegie Mellon University, ProQuest Dissertations Publishing, 1979, 7919143Nadia Burkart, Marco F. Huber (2020) A Survey on the Explainability of Supervised Machine Learning, arXiv:2011.07876 (cs) Further Readinghttps://openaccess.thecvf.com/content_CVPR_2019/papers/Pope_Explainability_Methods_for_Graph_Convolutional_Neural_Networks_CVPR_2019_paper.pdfhttps://towardsdatascience.com/explainable-deep-neural-networks-2f40b89d4d6f Nikos' Papers:https://www.mdpi.com/2079-9292/8/8/832/htmhttps://link.springer.com/article/10.1007/s11423-020-09858-2https://arxiv.org/pdf/2011.07876.pdfhttps://arxiv.org/pdf/2110.09467.pdf Host:Angelo KastroulisExecutive Producer: Kerri Patterson; Producer: Leslie Jennings Rowley; Communications Strategist: Albert Perrotta;Audio Engineer: Ryan ThompsonMusic: All Things Grow by Oliver Worth

Humans of AI: Stories, Not Stats
S2E8: Georgia Gkioxari with Dhruv Batra

Humans of AI: Stories, Not Stats

Play Episode Listen Later Sep 7, 2021 62:09


Georgia Gkioxari is a Research Scientist at Facebook AI Research and the Program Chair for CVPR 2021. Find out more about her at her homepage. Humans of AI: Stories, Not Stats is an interview series with AI researchers to get to know them better as people. We don't talk about AI or their work or the stats of their life like what college they went to. They share what they think about, what they are insecure about, what they get excited about. They share the stories of their day-to-day life. Videos of these interviews are available at humanstories.ai. The host of this episode is Dhruv Batra, an Associate Professor in the School of Interactive Computing at Georgia Tech and a Research Scientist at Facebook AI Research (FAIR). Find out more about him at his homepage or follow him on Twitter. This interview was recorded on 1st April 2021. --- Send in a voice message: https://anchor.fm/humanstoriesai/message

Autonocast
#229: The No Girls Allowed Discussion Episode

Autonocast

Play Episode Listen Later Jul 13, 2021 39:18


With Kirsten out exploring the western wilds, Alex and Ed run amok and revert to their old habits of spending way too much time discussing Tesla. First up, the lads review the latest NHTSA order to report all ADAS and AV-involved crashes, which doesn't mention Tesla specifically but is clearly very much aimed at that company. Further discussion ensues about Andrej Karpathy's recent CVPR talk, the new Plaid Model S's absurd performance and even more absurd user interface.

Yannic Kilcher Videos (Audio Only)
[ML News] CVPR bans social media paper promotion | AI restores Rembrandt | GPU prices down

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Jul 5, 2021 18:26


#cvpr #socialmedia #machinelearning In this week's ML news we look at CVPR's controversial action to ban paper promotions on social media during the review phase, among other things! OUTLINE: 0:00 - Intro & Overview 0:25 - CVPR bans social media paper discussions 5:10 - WalMart uses AI to suggest substitutions 6:05 - NVIDIA releases Alias-Free GAN 7:30 - Confession Video in Myanmar possibly a DeepFake 8:50 - AI restores Rembrandt painting 10:40 - AI for healthcare not problem-free yet 11:50 - ML interviews book 12:15 - NVIDIA canvas turns sketches into paintings 13:00 - GPU prices down after crypto shock 13:30 - Facebook AI improves shopping experience 14:05 - DeepLab2 released on GitHub 14:35 - Toxic Language Models: Nobody cares 16:55 - Does AI have common sense? References: CVPR forbids social media promotion https://twitter.com/wjscheirer/status... WalMart uses AI to substitute out-of-stock products https://www.supermarketnews.com/techn... NVIDIA releases Alias-Free GAN https://nvlabs.github.io/alias-free-gan/ Myanmar Politician's confession could be DeepFake https://www.wired.com/story/opinion-t... Rembrandt restored using AI https://www.smithsonianmag.com/smart-... AI in healthcare still shaky http://www.greenvillebusinessmag.com/... https://www.theverge.com/2021/6/22/22... ML interviews book https://huyenchip.com/ml-interviews-b... NVIDIA Canvas Beta available https://blogs.nvidia.com/blog/2021/06... GPU prices down as China cracks down on Crypto https://www.theregister.com/2021/06/2... Facebook AI's big goal of improving shopping https://ai.facebook.com/blog/advancin... GoogleAI releases DeepLab2 https://github.com/google-research/de... Toxic Language Model: Nobody cares https://arxiv.org/pdf/2105.03023.pdf AI has no common sense https://www.analyticsinsight.net/inca... https://6b.eleuther.ai/ Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Vector Quantization for NN Compression with Julieta Martinez - #498

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 5, 2021 39:34


Today we're joined by Julieta Martinez, a senior research scientist at recently announced startup Waabi.  Julieta was a keynote speaker at the recent LatinX in AI workshop at CVPR, and our conversation focuses on her talk “What do Large-Scale Visual Search and Neural Network Compression have in Common,” which shows that multiple ideas from large-scale visual search can be used to achieve state-of-the-art neural network compression. We explore the commonality between large databases and dealing with high dimensional, many-parameter neural networks, the advantages of using product quantization, and how that plays out when using it to compress a neural network.  We also dig into another paper Julieta presented at the conference, Deep Multi-Task Learning for Joint Localization, Perception, and Prediction, which details an architecture that is able to reuse computation between the three tasks, and is thus able to correct localization errors efficiently. The complete show notes for this episode can be found at twimlai.com/go/498.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Deep Unsupervised Learning for Climate Informatics with Claire Monteleoni - #497

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 1, 2021 40:30


Today we continue our CVPR 2021 coverage joined by Claire Monteleoni, an associate professor at the University of Colorado Boulder.  We cover quite a bit of ground in our conversation with Claire, including her journey down the path from environmental activist to one of the leading climate informatics researchers in the world. We explore her current research interests, and the available opportunities in applying machine learning to climate informatics, including the interesting position of doing ML from a data-rich environment.  Finally, we dig into the evolution of climate science-focused events and conferences, as well as the Keynote Claire gave at the EarthVision workshop at CVPR “Deep Unsupervised Learning for Climate Informatics,” which focused on semi- and unsupervised deep learning approaches to studying rare and extreme climate events. The complete show notes for this episode can be found at twimlai.com/go/497.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Skip-Convolutions for Efficient Video Processing with Amir Habibian - #496

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 28, 2021 46:15


Today we kick off our CVPR coverage joined by Amir Habibian, a senior staff engineer manager at Qualcomm Technologies.  In our conversation with Amir, whose research primarily focuses on video perception, we discuss a few papers they presented at the event. We explore the paper Skip-Convolutions for Efficient Video Processing, which looks at training discrete variables to end to end into visual neural networks. We also discuss his work on his FrameExit paper, which proposes a conditional early exiting framework for efficient video recognition.  The complete show notes for this episode can be found at twimlai.com/go/496.

Humans in the Loop: the Frontiers of Data Science
CVPR Panel 1: The Future of Datasets

Humans in the Loop: the Frontiers of Data Science

Play Episode Listen Later Jun 23, 2021 62:48


For this edition of CVPR, Activeloop is proud to host two panels. Listen to the first panel on the future of datasets. Tune in to the conversation of leading experts from Google (Jianing Wei), Princeton (Olga Russakovsky), Microsoft (Siddhartha Sen), RISELab at UC Berkeley (Joseph Gonzalez), and HuggingFace (Jeff Boudier).

Humans in the Loop: the Frontiers of Data Science
CVPR panel 2: Next-gen ML Infrastructure

Humans in the Loop: the Frontiers of Data Science

Play Episode Listen Later Jun 23, 2021 83:00


Activeloop, the dataset optimization company, is excited to be a part of CVPR 2021! For this edition of CVPR, Activeloop is proud to host two panel. Listen to the second panel on the Next-gen ML Infrastructure. Tune in to the conversation of leading experts from Activeloop (Davit Buniatyan), OctoML (Tianqi Chen), Weights & Biases (Lukas Biewald), YOLOv5 (Glenn Jocher) and more. Tune in and enjoy.

Remo Uherek
#31: Andrej Karpathy (Tesla): CVPR 2021 Workshop on Autonomous Vehicles (Supercut)

Remo Uherek

Play Episode Listen Later Jun 22, 2021 27:20


A supercut of Andrej Karpathy's CVPR 2021 Workshop on Autonomous Vehicles on June 20, 2021. I've cut out all the fillers, pauses and fluff. It's 30 percent shorter without any loss of information. Enjoy! Check out my course on personal finance and investing https://courses.remo.org/p/building-wealth (use code "TESLA50" for 50% off) For detailed show notes, check out the YouTube description to this video: https://youtu.be/2blLi3T4EGw Social: Blog: https://remo.org Twitter: https://twitter.com/remouherek I'm Remo Uherek, investor, writer, and former startup founder, based in Basel, Switzerland. Retired from startups after 15 years. Today, I'm a private investor, help to run a real estate portfolio and write about health, wealth, love, and happiness. I teach online courses and am obsessed with innovation, Tesla and Elon Musk. Please subscribe if you want to hear more episodes like this. (This episode was originally published on YouTube: https://youtu.be/2blLi3T4EGw) --- Send in a voice message: https://anchor.fm/remouherek/message

Tesla Daily: Tesla News & Analysis
Karpathy Talks Tesla Vision, Supercomputer + AI Day, Leaked Email, Subscriptions, Plaid, Berlin (06.21.21)

Tesla Daily: Tesla News & Analysis

Play Episode Listen Later Jun 22, 2021 20:49


➤ Tesla's Senior Director of Artificial Intelligence, Andrej Karpathy, presents at CVPR's 2021 workshop on autonomous driving, sharing new details about Tesla's progress on autonomy, the decision to remove radar, Tesla's supercomputer, and more ➤ Link to full presentation: https://youtu.be/eOL_rCK59ZI?t=28292 ➤ Previous episode on Tesla's supercomputer: https://youtu.be/M7git-yhtno ➤ Elon Musk provides update on Tesla's AI Day ➤ What to expect from Tesla's AI Day: https://youtu.be/ceYK-0sgO2k ➤ Reported leaked email from Elon Musk ahead of quarter end ➤ Tesla updates mobile app to include subscription upgrades ➤ Motor Trend reports more details on Model S Plaid ➤ Giga Berlin timeline from newly published documents ➤ Jerome Guillen sells TSLA stock after departure Twitter: https://www.twitter.com/teslapodcast Patreon: https://www.patreon.com/tesladailypodcast Tesla Referral: https://ts.la/robert47283 Plaid producer Who Why Executive producer Jeremy Cooke Executive producer Troy Cherasaro Executive producer Andre/Maria Kent Executive producer Jeff Sheets Executive producer Jessie Chimni Executive producer Tesla Summary Executive producer Jeffrey Yu Music by Evan Schaeffer Disclosure: Rob Maurer is long TSLA stock & derivatives

Machine Learning Street Talk
#55 Self-Supervised Vision Models (Dr. Ishan Misra - FAIR).

Machine Learning Street Talk

Play Episode Listen Later Jun 21, 2021 96:21


Dr. Ishan Misra is a Research Scientist at Facebook AI Research where he works on Computer Vision and Machine Learning. His main research interest is reducing the need for human supervision, and indeed, human knowledge in visual learning systems. He finished his PhD at the Robotics Institute at Carnegie Mellon. He has done stints at Microsoft Research, INRIA and Yale. His bachelors is in computer science where he achieved the highest GPA in his cohort. Ishan is fast becoming a prolific scientist, already with more than 3000 citations under his belt and co-authoring with Yann LeCun; the godfather of deep learning. Today though we will be focusing an exciting cluster of recent papers around unsupervised representation learning for computer vision released from FAIR. These are; DINO: Emerging Properties in Self-Supervised Vision Transformers, BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction and PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples. All of these papers are hot off the press, just being officially released in the last month or so. Many of you will remember PIRL: Self-Supervised Learning of Pretext-Invariant Representations which Ishan was the primary author of in 2019. References; Shuffle and Learn - https://arxiv.org/abs/1603.08561 DepthContrast - https://arxiv.org/abs/2101.02691 DINO - https://arxiv.org/abs/2104.14294 Barlow Twins - https://arxiv.org/abs/2103.03230 SwAV - https://arxiv.org/abs/2006.09882 PIRL - https://arxiv.org/abs/1912.01991 AVID - https://arxiv.org/abs/2004.12943 (best paper candidate at CVPR'21 (just announced over the weekend) - http://cvpr2021.thecvf.com/node/290) Alexei (Alyosha) Efros http://people.eecs.berkeley.edu/~efros/ http://www.cs.cmu.edu/~tmalisie/projects/nips09/ Exemplar networks https://arxiv.org/abs/1406.6909 The bitter lesson - Rich Sutton http://www.incompleteideas.net/IncIdeas/BitterLesson.html Machine Teaching: A New Paradigm for Building Machine Learning Systems https://arxiv.org/abs/1707.06742 POET https://arxiv.org/pdf/1901.01753.pdf

Dave Lee on Investing
Tesla's Andrej Karpathy Reveals Major FSD Update (Ep. 370)

Dave Lee on Investing

Play Episode Listen Later Jun 21, 2021 9:43


Andrej Karpathy (Tesla's head of AI) shared a huge update on Tesla FSD at the CVPR virtual computer vision conference. I share my impressions and why this is important. Social

Den of Rich
Evgeny Burnaev | Евгений Бурнаев

Den of Rich

Play Episode Listen Later May 27, 2021 84:09


Evgeny Burnaev, Ph.D. Associate Professor of Center for Computational and Data-Intensive Science and Engineering (CDISE) at Skolkovo Institute of Science and Technology (Skoltech). Evgeny graduated from the Moscow Institute of Physics and Technology in 2006. After getting a Candidate of Sciences degree from the Institute for Information Transmission Problem in 2008, he stayed with the Institute as a head of the Data Analysis and Predictive Modeling Lab. Since 2007 Evgeny carried out a number of successful industrial projects with Airbus, SAFT, IHI, and Sahara Force India Formula 1 team among others. The corresponding data analysis algorithms, developed by Evgeny and his scientific group, formed a core of the algorithmic software library for metamodeling and optimization. Thanks to the developed functionality, engineers can construct fast mathematical approximations to long-running computer codes (realizing physical models) based on available data and perform design space exploration for trade-off studies. The software library passed the final Technology Readiness Level certification in Airbus. According to Airbus experts, the application of the library “provides the reduction of up to 10% of lead time and cost in several areas of the aircraft design process”. Nowadays a spin-off company Datadvance develops a Software platform for Design Space Exploration with GUI based on this algorithmic core. Evgeny's current research focuses on the development of new algorithms in machine learning and artificial intelligence such as deep networks for an approximation of physical models, generative modeling, and manifold learning, with applications to computer vision and 3D reconstruction, neurovisualization. The results are published in top computer science conferences (ICML, ICLR, NeurIPS, CVPR, ICCV, and ECCV) and journals. Evgeny Burnaev was honored with several awards for his research, including the Moscow Government Prize for Young Scientists in the category for the Transmission, Storage, Processing and Protection of Information for leading the project “The development of methods for predictive analytics for processing industrial, biomedical and financial data”, Geometry Processing Dataset Award for the work “ABC Dataset: A Big CAD Model Dataset For Geometric Deep Learning”, Symposium on Geometry Processing (2019), the Best Paper Award for the research in eSports at the IEEE Internet of People conference (2019), the Ilya Segalovich Yandex Science Prize “The best research director of postgraduate students in the field of computer sciences” (2020), the Best Paper Award for the research on modeling of point clouds and predicting properties of 3D shapes at the Int. Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR) (2020). FIND EVGENY ON SOCIAL MEDIA LinkedIn | Facebook | Instagram | Twitter © Copyright 2022 Den of Rich. All rights reserved.

Den of Rich
#154 - Evgeny Burnaev

Den of Rich

Play Episode Listen Later May 27, 2021 84:09


Evgeny Burnaev, Ph.D. Associate Professor of Center for Computational and Data-Intensive Science and Engineering (CDISE) at Skolkovo Institute of Science and Technology (Skoltech). Evgeny graduated from the Moscow Institute of Physics and Technology in 2006. After getting a Candidate of Sciences degree from the Institute for Information Transmission Problem in 2008, he stayed with the Institute as a head of the Data Analysis and Predictive Modeling Lab.Since 2007 Evgeny carried out a number of successful industrial projects with Airbus, SAFT, IHI, and Sahara Force India Formula 1 team among others. The corresponding data analysis algorithms, developed by Evgeny and his scientific group, formed a core of the algorithmic software library for metamodeling and optimization. Thanks to the developed functionality, engineers can construct fast mathematical approximations to long-running computer codes (realizing physical models) based on available data and perform design space exploration for trade-off studies. The software library passed the final Technology Readiness Level certification in Airbus. According to Airbus experts, the application of the library “provides the reduction of up to 10% of lead time and cost in several areas of the aircraft design process”. Nowadays a spin-off company Datadvance develops a Software platform for Design Space Exploration with GUI based on this algorithmic core.Evgeny's current research focuses on the development of new algorithms in machine learning and artificial intelligence such as deep networks for an approximation of physical models, generative modeling, and manifold learning, with applications to computer vision and 3D reconstruction, neurovisualization. The results are published in top computer science conferences (ICML, ICLR, NeurIPS, CVPR, ICCV, and ECCV) and journals.Evgeny Burnaev was honored with several awards for his research, including Moscow Government Prize for Young Scientists in the category for the Transmission, Storage, Processing and Protection of Information for leading the project “The development of methods for predictive analytics for processing industrial, biomedical and financial data”, Geometry Processing Dataset Award for the work “ABC Dataset: A Big CAD Model Dataset For Geometric Deep Learning”, Symposium on Geometry Processing (2019), the Best Paper Award for the research in eSports at the IEEE Internet of People conference (2019), the Ilya Segalovich Yandex Science Prize “The best research director of postgraduate students in the field of computer sciences” (2020), the Best Paper Award for the research on modeling of point clouds and predicting properties of 3D shapes at the Int. Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR) (2020).FIND EVGENY ON SOCIAL MEDIALinkedIn | Facebook | Instagram | Twitter

Den of Rich
Ivan Oseledets | Иван Оселедец

Den of Rich

Play Episode Listen Later Apr 26, 2021 88:16


Ivan Oseledets is a Professor of the Center for Computational and Data-Intensive Science and Engineering at Skoltech. Ivan graduated from Moscow Institute of Physics and Technology in 2006, got Candidate of Sciences degree in 2007, and Doctor of Sciences in 2012, both from Marchuk Institute of Numerical Mathematics of Russian Academy of Sciences. He joined Skoltech CDISE in 2013. Ivan's research covers a broad range of topics. He proposed a new decomposition of high-dimensional arrays (tensors) – tensor-train decomposition, and developed many efficient algorithms for solving high-dimensional problems. These algorithms are used in different areas of chemistry, biology, data analysis and machine learning. His current research focuses on development of new algorithms in machine learning and artificial intelligence such as construction of adversarial examples, theory of generative adversarial networks and compression of neural networks. It resulted in publications in top computer science conferences such as ICML, NIPS, ICLR, CVPR, RecSys, ACL and ICDM. Ivan is an Associate Editor of SIAM Journal on Mathematics in Data Science, SIAM Journal on Scientific Computing, Advances in Computational Mathematics (Springer). He is also an area chair of ICLR 2020 conference. Ivan got several awards for his research and industrial cooperation, including two gold medals of Russian academy of Sciences (for students in 2005 and young researchers in 2009), Dynasty Foundation award (2012), SIAM Outstanding Paper Prize (2018), Russian President Award for young researchers in science and innovation (2018), Ilya Segalovich award for Best PhD thesis supervisor (2019), Best Professor award from Skoltech (2019), the best cooperation project leader award from Huawei (2015, 2017). He also has been a Pi and Co-Pi of several grants and industrial projects (230 million of rubles since 2017). In 2021, Ivan became one of the winners of the Humboldt Research Award, is an award given by the Alexander von Humboldt Foundation of Germany. Ivan is actively involved in education and research supervision: he introduced and is teaching three courses of Skoltech curriculum, and five of his PhD students have successfully defended their theses, including two PhD students at Skoltech. FIND IVAN ON SOCIAL MEDIA LinkedIn | Facebook | Instagram | Twitter © Copyright 2022 Den of Rich. All rights reserved.

Den of Rich
#128 - Ivan Oseledets

Den of Rich

Play Episode Listen Later Apr 26, 2021 88:17


Ivan Oseledets is a Professor of the Center for Computational and Data-Intensive Science and Engineering at Skoltech.Ivan graduated from Moscow Institute of Physics and Technology in 2006, got Candidate of Sciences degree in 2007, and Doctor of Sciences in 2012, both from Marchuk Institute of Numerical Mathematics of Russian Academy of Sciences. He joined Skoltech CDISE in 2013.Ivan's research covers a broad range of topics. He proposed a new decomposition of high-dimensional arrays (tensors) – tensor-train decomposition, and developed many efficient algorithms for solving high-dimensional problems. These algorithms are used in different areas of chemistry, biology, data analysis and machine learning. His current research focuses on development of new algorithms in machine learning and artificial intelligence such as construction of adversarial examples, theory of generative adversarial networks and compression of neural networks. It resulted in publications in top computer science conferences such as ICML, NIPS, ICLR, CVPR, RecSys, ACL and ICDM.Ivan is an Associate Editor of SIAM Journal on Mathematics in Data Science, SIAM Journal on Scientific Computing, Advances in Computational Mathematics (Springer). He is also an area chair of ICLR 2020 conference.Ivan got several awards for his research and industrial cooperation, including two gold medals of Russian academy of Sciences (for students in 2005 and young researchers in 2009), Dynasty Foundation award (2012), SIAM Outstanding Paper Prize (2018), Russian President Award for young researchers in science and innovation (2018), Ilya Segalovich award for Best PhD thesis supervisor (2019), Best Professor award from Skoltech (2019), the best cooperation project leader award from Huawei (2015, 2017). He also has been a Pi and Co-Pi of several grants and industrial projects (230 million of rubles since 2017). In 2021, Ivan became one of the winners of the Humboldt Research Award, is an award given by the Alexander von Humboldt Foundation of Germany.Ivan is actively involved in education and research supervision: he introduced and is teaching three courses of Skoltech curriculum, and five of his PhD students have successfully defended their theses, including two PhD students at Skoltech.FIND IVAN ON SOCIAL MEDIALinkedIn | Facebook | Instagram | Twitter

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Trends in Graph Machine Learning with Michael Bronstein - #446

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jan 11, 2021 70:52


Today we’re back with the final episode of AI Rewind joined by Michael Bronstein, a professor at Imperial College London and the Head of Graph Machine Learning at Twitter. In our conversation with Michael, we touch on his thoughts about the year in Machine Learning overall, including GPT-3 and Implicit Neural Representations, but spend a major chunk of time on the sub-field of Graph Machine Learning.  We talk through the application of Graph ML across domains like physics and bioinformatics, and the tools to look out for. Finally, we discuss what Michael thinks is in store for 2021, including graph ml applied to molecule discovery and non-human communication translation.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Trends in Computer Vision with Pavan Turaga - #444

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jan 4, 2021 70:29


AI Rewind continues today as we’re joined by Pavan Turaga, Associate Professor in both the Departments of Arts, Media, and Engineering & Electrical Engineering, and the Interim Director of the School of Arts, Media, and Engineering at Arizona State University. Pavan, who joined us back in June to talk through his work from CVPR ‘20, Invariance, Geometry and Deep Neural Networks, is back to walk us through the trends he’s seen in Computer Vision last year. We explore the revival of physics-based thinking about scenes, differential rendering, the best papers, and where the field is going in the near future. We want to hear from you! Send your thoughts on the year that was 2020 below in the comments, or via Twitter at @samcharrington or @twimlai. The complete show notes for this episode can be found at twimlai.com/go/444

Chai Time Data Science
Torsten Sattler: CV, Mixed Reality, Localisation & Robotics #125

Chai Time Data Science

Play Episode Listen Later Dec 10, 2020 69:26


Video Version: https://youtu.be/Ki3SiI0nzF8 Subscribe here to the newsletter: https://tinyletter.com/sanyambhutani In this episode, Sanyam Bhutani interviews Senior CV Researcher at CTU: Torsten Sattler. Torsten's research interests span: 3D Computer Vision, Localization, AR, VR, self driving cars. Like all research conversations, this was really an insightful one They also dive into his recent research as a proxy to understanding how it Torsten approaches the search problems. Links: http://openaccess.thecvf.com/content_cvpr_2018/papers/Schonberger_Semantic_Visual_Localization_CVPR_2018_paper.pdf https://openaccess.thecvf.com/content_CVPR_2019/papers/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.pdf Follow: Torsten Sattler: https://twitter.com/sattlertorsten Sanyam Bhutani: https://twitter.com/bhutanisanyam1 Blog: sanyambhutani.com About: https://sanyambhutani.com/tag/chaitimedatascience/ A show for Interviews with Practitioners, Kagglers & Researchers and all things Data Science hosted by Sanyam Bhutani. You can expect weekly episodes every available as Video, Podcast, and blogposts. Intro track: Flow by LiQWYD https://soundcloud.com/liqwyd #GoogleAI #Research #MultiModal

Machine Learning Podcast - Jay Shah
Siddha Ganju, Forbes 30 Under 30 and Self-Driving Architect @NVIDIA

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Nov 23, 2020 34:02


Siddha is a self-driving architect at NVIDIA and she also guides teams at NASA as an AI domain expert . She was featured on the Forbes 30 under 30 list in 2019. Previously, she developed deep learning models for resource-constrained edge devices at DeepVision. She earned her Master's degree from Carnegie Mellon University, and her work ranges from visual question answering to generative adversarial networks to gathering insights from CERN’s petabyte-scale data and has been published at top-tier conferences including CVPR and NeurIPS.In this interview she shares her insights on:1. How she got started with the domain ML and Computer Vision.2. How graduate study and internships helped her gain a perspective of theoretical and practical knowledge in AI.3. How she keeps herself updated to this every-day changing research domain.4. What her book, she recently wrote, is about and who should read it.5. What are the new trends to look for in AI in 2020 and forward.6. General advice to any beginners in ML, some DO's and DON'T'sAbout the Host:Jay is a PhD student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Machine Learning Engineered
Why Multi-Modality is the Future of Machine Learning w/ Letitia Parcalabescu (University of Heidelberg, AI Coffee Break)

Machine Learning Engineered

Play Episode Listen Later Nov 10, 2020 91:48


Letitia Parcalabescu is a PhD candidate at the University of Heidelberg focused on multi-modal machine learning, specifically with vision and language. Learn more about Letitia: https://www.cl.uni-heidelberg.de/~parcalabescu/ (https://www.cl.uni-heidelberg.de/~parcalabescu/) https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA (https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA) Every Thursday I send out the most useful things I've learned, curated specifically for the busy machine learning engineer. Sign up here: http://bitly.com/mle-newsletter (http://bitly.com/mle-newsletter) Follow Charlie on Twitter: https://twitter.com/CharlieYouAI (https://twitter.com/CharlieYouAI) Take the Giving What We Can Pledge: https://www.givingwhatwecan.org/ (https://www.givingwhatwecan.org/) Subscribe to ML Engineered: https://mlengineered.com/listen (https://mlengineered.com/listen) Comments? Questions? Submit them here: http://bitly.com/mle-survey (http://bitly.com/mle-survey) Timestamps: 01:30 Follow Charlie on Twitter (https://twitter.com/CharlieYouAI (https://twitter.com/CharlieYouAI)) 02:40 Letitia Parcalabescu 03:55 How she got started in CS and ML 07:20 What is multi-modal machine learning? (https://www.youtube.com/playlist?list=PLpZBeKTZRGPNKxoNaeMD9GViU_aH_HJab (https://www.youtube.com/playlist?list=PLpZBeKTZRGPNKxoNaeMD9GViU_aH_HJab)) 16:55 Most exciting use-cases for ML 20:45 The 5 stages of machine understanding (https://www.youtube.com/watch?v=-niprVHNrgI (https://www.youtube.com/watch?v=-niprVHNrgI)) 23:15 The future of multi-modal ML (GPT-50?) 27:00 The importance of communicating AI breakthroughs to the general public 37:40 Positive applications of the future “GPT-50” 43:35 Letitia's CVPR paper on phrase grounding (https://openaccess.thecvf.com/content_CVPRW_2020/papers/w56/Parcalabescu_Exploring_Phrase_Grounding_Without_Training_Contextualisation_and_Extension_to_Text-Based_CVPRW_2020_paper.pdf (https://openaccess.thecvf.com/content_CVPRW_2020/papers/w56/Parcalabescu_Exploring_Phrase_Grounding_Without_Training_Contextualisation_and_Extension_to_Text-Based_CVPRW_2020_paper.pdf)) 53:15 ViLBERT: is attention all you need in multi-modal ML? (https://arxiv.org/abs/1908.02265 (https://arxiv.org/abs/1908.02265)) 57:00 Preventing “modality dominance” 01:03:25 How she keeps up in such a fast-moving field 01:10:50 Why she started her AI Coffee Break YouTube Channel (https://www.youtube.com/c/AICoffeeBreakwithLetitiaParcalabescu/ (https://www.youtube.com/c/AICoffeeBreakwithLetitiaParcalabescu/)) 01:18:10 Rapid fire questions Links: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA (AI Coffee Break Youtube Channel) https://openaccess.thecvf.com/content_CVPRW_2020/papers/w56/Parcalabescu_Exploring_Phrase_Grounding_Without_Training_Contextualisation_and_Extension_to_Text-Based_CVPRW_2020_paper.pdf (Exploring Phrase Grounding without Training) https://www.youtube.com/playlist?list=PLpZBeKTZRGPNKxoNaeMD9GViU_aH_HJab (AI Coffee Break series on Multi-Modal learning) https://www.youtube.com/watch?v=-niprVHNrgI (What does it take for an AI to understand language?) https://arxiv.org/abs/1908.02265 (ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations)

Machine Learning Podcast - Jay Shah
Best resources to remain updated to Machine Learning Research | Siddha Ganju, @NVIDIA

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Nov 5, 2020 2:39


Siddha is a self-driving architect at NVIDIA and she also guides teams at NASA as an AI domain expert. She was featured on the Forbes 30 under 30 list in 2019. Previously, she developed deep learning models for resource-constrained edge devices at DeepVision. She earned her Master's degree from Carnegie Mellon University, and her work ranges from visual question answering to generative adversarial networks to gathering insights from CERN’s petabyte-scale data and has been published at top-tier conferences including CVPR and NeurIPS.Hosted by:Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Retail PowerChat
Retail PowerChat #041: Using AI to Drive Retail Profitability with Matt Scott, Malong, Part I

Retail PowerChat

Play Episode Listen Later Oct 26, 2020 11:43


In this episode, Matt Scott, CEO and Co-founder of Malong, begins with the major new trends in retail and persistent issues. Matt explains how AI and ML are helping retail become more efficient and provides a refresher on Malong’s loss prevention solutions. Matt concludes with Malong’s improvements to enhance accuracy and implement the technique of “nudging.” White Paper Link: https://openaccess.thecvf.com/content_CVPR_2020/papers/Yu_Deformable_Siamese_Attention_Networks_for_Visual_Object_Tracking_CVPR_2020_paper.pdf

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
That's a VIBE: ML for Human Pose and Shape Estimation with Nikos Athanasiou, Muhammed Kocabas, Michael Black - #409

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Sep 14, 2020 44:51


Today we’re joined by Nikos Athanasiou, Muhammed Kocabas, Ph.D. students, and Michael Black, Director of the Max Planck Institute for Intelligent Systems.  We caught up with the group to explore their paper VIBE: Video Inference for Human Body Pose and Shape Estimation, which they submitted to CVPR 2020. In our conversation, we explore the problem that they’re trying to solve through an adversarial learning framework, the datasets (AMASS) that they’re building upon, the core elements that separate this work from its predecessors in this area of research, and the results they’ve seen through their experiments and testing.  The complete show notes for this episode can be found at https://twimlai.com/go/409. Register for TWIMLfest today!

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
3D Deep Learning with PyTorch 3D w/ Georgia Gkioxari - #408

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Sep 10, 2020 36:46


Today we’re joined by Georgia Gkioxari, a research scientist at Facebook AI Research.  Georgia was hand-picked by the TWIML community to discuss her work on the recently released open-source library PyTorch3D. In our conversation, Georgia describes her experiences as a computer vision researcher prior to the 2012 deep learning explosion, and how the entire landscape has changed since then.  Georgia walks us through the user experience of PyTorch3D, while also detailing who the target audience is, why the library is useful, and how it fits in the broad goal of giving computers better means of perception. Finally, Georgia gives us a look at what it’s like to be a co-chair for CVPR 2021 and the challenges with updating the peer review process for the larger academic conferences.  The complete show notes for this episode can be found at twimlai.com/go/408.

Underrated ML
Naila Murray - Interestingness predictions and getting to grips with data privacy

Underrated ML

Play Episode Listen Later Aug 14, 2020 68:52


This week we are joined by Naila Murray. Naila obtained a B.Sc. in Electrical Engineering from Princeton University in 2007. In 2012, she received her PhD from the Universitat Autonoma de Barcelona, in affiliation with the Computer Vision Center. She joined NAVER LABS Europe (then Xerox Research Centre Europe) in January 2013, working on topics including fine-grained visual categorization, image retrieval, and visual attention. From 2015 to 2019 she led the computer vision team at NLE. She currently serves as NLE's director of science. She serves/served as area chair for ICLR 2018, ICCV 2019, ICLR 2019, CVPR 2020, ECCV 2020, and programme chair for ICLR 2021. Her research interests include representation learning and multi-modal search.We discuss using sparse pairwise comparisons to learn a ranking function that is robust to outliers. We also take a look at using generative models in order to utilise once inaccessible datasets.Underrated ML Twitter: https://twitter.com/underrated_mlNaila Murray Twitter: https://twitter.com/NailaMurrayPlease let us know who you thought presented the most underrated paper in the form below: https://forms.gle/97MgHvTkXgdB41TC8Links to the papers:"Interestingness Prediction by Robust Learning to Rank" [paper]"Generative Models for Effective ML on Private Decentralized datasets" - [paper]

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Human-AI Collaboration for Creativity with Devi Parikh - #399

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Aug 10, 2020 44:56


Today we’re joined by Devi Parikh, Associate Professor at the School of Interactive Computing at Georgia Tech, and research scientist at Facebook AI Research (FAIR).  While Devi’s work is more broadly focused on computer vision applications, we caught up to discuss her presentation on AI and Creativity at the CV for Fashion, Art and Design workshop at CVPR 2020. In our conversation, we touch on Devi’s definition of creativity,  explore multiple ways that AI could impact the creative process for artists, and help humans become more creative. We investigate tools like casual creator for preference prediction, neuro-symbolic generative art, and visual journaling.  The complete show notes for this episode can be found at twimlai.com/talk/399. A quick reminder that this is your last chance to register for tomorrow’s Model Explainability Forum! For more information, visit https://twimlai.com/explainabilityforum.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
The Case for Hardware-ML Model Co-design with Diana Marculescu - #391

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 13, 2020 44:58


Today we’re joined by Diana Marculescu, Department Chair and Professor of Electrical and Computer Engineering at University of Texas at Austin.  We caught up with Diana to discuss her work on hardware-aware machine learning. In particular, we explore her keynote, “Putting the “Machine” Back in Machine Learning: The Case for Hardware-ML Model Co-design” from the Efficient Deep Learning in Computer Vision workshop at this year’s CVPR conference.  In our conversation, we explore how her research group is focusing on making ML models more efficient so that they run better on current hardware systems, and what components and techniques they’re using to achieve true co-design. We also discuss her work with Neural architecture search, how this fits into the edge vs cloud conversation, and her thoughts on the longevity of deep learning research.  The complete show notes for this episode can be found at twimlai.com/talk/391.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Computer Vision for Remote AR with Flora Tasse - #390

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 9, 2020 40:54


Today we conclude our CVPR coverage joined by Flora Tasse, Head of Computer Vision & AI Research at Streem.  Flora, a keynote speaker at the AR/VR workshop at CVPR, walks us through some of the interesting use cases at the intersection of AI, computer vision, and augmented reality technology. In our conversation, we discuss how Flora’s interest in a career in AR/VR developed, the origin of her company Selerio, which was eventually acquired by Streem, and her current research. We also spend time exploring the difficulties associated with building 3D mesh environments, extracting metadata from those environments, the challenges of pose estimation, and other papers that caught Flora’s eye from the conference. The complete show notes for this episode can be found at twimlai.com/talk/390. For our complete CVPR series, head to twimlai.com/cvpr20.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Deep Learning for Automatic Basketball Video Production with Julian Quiroga - #389

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 6, 2020 42:15


Today we return to our coverage of the 2020 CVPR conference with a conversation with Julian Quiroga, a Computer Vision Team Lead at Genius Sports. Julian presented his recent paper “As Seen on TV: Automatic Basketball Video Production using Gaussian-based Actionness and Game States Recognition” at the CVSports workshop. We jump right into the paper, discussing details like camera setups and angles, detection and localization of the figures on the court (players, refs, and of course, the ball), and the role that deep learning plays in the process. We also break down how this work applies to different sports, and the ways that Julian is looking to improve on this work for better accuracy.  The complete show notes for this episode can be found at twimlai.com/talk/389. To follow along with our entire CVPR series, visit twimlai.com/cvpr20. Thanks again to our friends at Qualcomm for their support of the podcast and sponsorship of this series!

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
How External Auditing is Changing the Facial Recognition Landscape with Deb Raji - #388

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 2, 2020 81:47


Today we’re taking a break from our CVPR coverage to bring you this interview with Deb Raji, a Technology Fellow at the AI Now Institute at New York University.  Over the past week or two, there have been quite a few major news stories in the AI community, including the self-imposed moratorium on facial recognition technology from Amazon, IBM and Microsoft.There was also the release of PULSE, a controversial computer vision model that ultimately sparked a Twitter firestorm involving Yann Lecun and AI ethics researchers, including friend of the show, Timnit Gebru. The controversy echoed into the broader AI community, eventually leading to the former’s departure from Twitter.  In our conversation with Deb, we dig into these stories in depth, discussing the origins of Deb’s work on the Gender Shades project, how subsequent work put a spotlight on the potential harms of facial recognition technology, and who holds responsibility for dealing with underlying bias issues in datasets. The complete show notes for this episode can be found at twimlai.com/talk/388.

Pod de Engineer
話題の SF 小説三体読んだ?いまさら聞けない RepNet と AtCorder まとめ

Pod de Engineer

Play Episode Listen Later Jul 1, 2020 35:11


中国の大人気 SF 小説である三体を読んだ感想、CVPR 2020 という国際会議で最近発表された Google 製の RepNet について、AtCoder という競技プログラミングサイトで最近青色ランクになった話をしました。 https://www.amazon.co.jp/dp/B07TS9XTSD https://sites.google.com/view/repnet/home https://atcoder.jp/ https://qiita.com/e869120/items/eb50fdaece12be418faa https://qiita.com/e869120/items/acba3dd8649d913102b5 https://atcoder.jp/contests/dp https://atcoder.jp/contests/tdpc --- Send in a voice message: https://anchor.fm/pod-de-engineer/message

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
AI for High-Stakes Decision Making with Hima Lakkaraju - #387

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 29, 2020 45:54


Today we’re joined by Hima Lakkaraju, an Assistant Professor at Harvard University with appointments in both the Business School and Department of Computer Science.  At CVPR, Hima was a keynote speaker at the Fair, Data-Efficient and Trusted Computer Vision Workshop, where she spoke on Understanding the Perils of Black Box Explanations. Hima talks us through her presentation, which focuses on the unreliability of explainability techniques that center perturbations, such as LIME or SHAP, as well as how attacks on these models can be carried out, and what these attacks look like. We also discuss people’s tendency to trust computer systems and their outputs, her thoughts on collaborator (and former TWIML guest) Cynthia Rudin’s theory that we shouldn’t use black-box algorithms, and much more. For the complete show notes, visit twimlai.com/talk/387. For our continuing CVPR Coverage, visit twimlai.com/cvpr20.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Invariance, Geometry and Deep Neural Networks with Pavan Turaga - #386

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 25, 2020 47:14


We continue our CVPR coverage with today’s guest, Pavan Turaga, Associate Professor at Arizona State University, with dual appointments as the Director of the Geometric Media Lab, and Interim Director of the School of Arts, Media, and Engineering. Pavan gave a keynote presentation at the Differential Geometry in CV and ML Workshop, speaking on Revisiting Invariants with Geometry and Deep Learning. In our conversation, we go in-depth on Pavan’s research integrating physics-based principles into computer vision. We also discuss the context of the term “invariant,” and the role of architectural, loss function, and data constraints on models. Pavan also contextualizes this work in relation to Hinton’s similar Capsule Network research. Check out the complete show notes for this episode at twimlai.com/talk/386.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Channel Gating for Cheaper and More Accurate Neural Nets with Babak Ehteshami Bejnordi - #385

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jun 22, 2020 55:58


Today we’re joined by Babak Ehteshami Bejnordi, a Research Scientist at Qualcomm. Babak works closely with former guest Max Welling and is currently focused on conditional computation, which is the main driver for today’s conversation. We dig into a few papers in great detail including one from this year’s CVPR conference, Conditional Channel Gated Networks for Task-Aware Continual Learning.  We also discuss the paper TimeGate: Conditional Gating of Segments in Long-range Activities, and another paper from this year’s ICLR conference, Batch-Shaping for Learning Conditional Channel Gated Networks. We cover how gates are used to drive efficiency and accuracy, while decreasing model size, how this research manifests into actual products, and more!  For more information on the episode, visit twimlai.com/talk/385. To follow along with the CVPR 2020 Series, visit twimlai.com/cvpr20.  Thanks to Qualcomm for sponsoring today’s episode and the CVPR 2020 Series!

Agile Giants: Lessons from Corporate Innovators
Episode 32: Investing CVC Dollars into Frontier Tech Startups

Agile Giants: Lessons from Corporate Innovators

Play Episode Listen Later May 19, 2020 25:52


Frontier tech - emerging technologies such as robotics, computer vision and machine learning - are transforming so many of our industries. Established companies need to understand and figure out the right way to react.  On this episode of Agile Giants, I sit down with Vin Tang, the Co-Founder of Samsung's Q Fund. The fund's tagline is: "we're more CVPR, less CES."  Referencing the annual research event, the Conference Of Computer Vision and Pattern Recognition versus the Consumer Electronic Show in Las Vegas. This is clearly a different type of venture fund, but there are a lot of lessons for other corporate venture investors. 

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Disrupting DeepFakes: Adversarial Attacks Against Conditional Image Translation Networks with Nataniel Ruiz - #375

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later May 14, 2020 42:42


Today we’re joined by Nataniel Ruiz, a PhD Student in the Image & Video Computing group at Boston University.  We caught up with Nataniel to discuss his paper “Disrupting DeepFakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems,” which will be presented at the upcoming CVPR conference. In our conversation, we discuss the concept of this work, which essentially injects noise into an image to disrupt a generative model’s ability to manipulate said image. We also explore some of the challenging parts of implementing this work, a few potential scenarios in which this could be deployed, and the broader contributions that went into this work.  The complete show notes for this episode can be found at twimlai.com/talk/375.

Tentang Data
S01E04 Teknologi di Balik Kinect

Tentang Data

Play Episode Listen Later Mar 27, 2020 27:07


Penasaran tentang teknologi di balik Kinect dan seperti apa perkembangan terkini dari teknologi estimasi pose manusia? Saya berdiskusi dengan Ikhsanul Habibie (https://people.mpi-inf.mpg.de/~ihabibie/) tentang risetnya di Max Planck Institute for Informatics yang sudah diterima di CVPR, salah satu konferensi top computer vision di dunia, termasuk aplikasinya ke gim seperti FIFA atau film yang membutuhkan sentuhan grafis komputer seperti Avengers.

Leaders In AI
[CVPR 2019 AI Talk] Dr. Ji Liu, Director @ Kuaishou Seattle AI Lab & FeDA Lab

Leaders In AI

Play Episode Listen Later Jul 17, 2019 14:34


Ji Liu is an assistant professor at the University of Rochester and the Director at Kuaishou Seattle AI Lab. Prior to Kuaishou, he was the Principal Scientist at Tencent. His research interests include Machine Learning, Optimization, Reinforcement Learning with emphasis on big data involved scenarios. Professor Liu is also interested in applying technology to solve real world problems in the area of image analysis, game AI design, vision understanding, etc. The asynchronous parallel algorithm he proposed has been widely used in many machine learning platforms, such as Google’s TensorFlow. In 2018, he was also named “Innovator Under 35 in China” by MIT. This episode is a live recording of our interview with Dr. Ji Liu at the 2019 CVPR conference. He shared his experience in deep learning research and it is applied to optimize user experience at Kuaishou.

DataCast
Episode 17: Computer Vision Research with Genevieve Patterson

DataCast

Play Episode Listen Later Jul 16, 2019 53:18


Show Notes: (2:09) Genevieve discussed her undergraduate experience studying Electrical Engineering and Mathematics at the University of Arizona. (3:18) Genevieve talked about her Master’s work in Electrical Machines from the University of Tokyo. (6:59) Genevieve went in-depth about her research work on transverse-flux motor design during her Master’s, in which she won the Outstanding Paper award at ICEMS 2009. (11:39) Genevieve talked about her motivation to pursue a Ph.D. degree in Computer Science at Brown University after coming back from Japan. (14:17) Genevieve shared her story of finding her research advisor (Dr. James Hays) as a graduate student. (18:44) Genevieve discussed her work building and maintaining the SUN Attributes dataset, a widely used resource for scene understanding, during her first year of her Ph.D. degree. (21:52) Genevieve talked about the paper Basic Level Scene Understanding (2013), her collaboration with researchers from MIT, Princeton, and University of Washington to build a system that can automatically understand 3D scenes from a single image. (24:32) Genevieve talked about the paper Bootstrapping Fine-grained Classifiers: Active Learning with a Crowd in the Loop presented at the NIPS conference in 2013, her collaboration with researchers from UCSD and Cal-Tech to propose an iterative crowd-enabled active learning algorithm for building high-precision visual classifiers from unlabeled images. (28:25) Genevieve discussed her Ph.D. thesis titled “Collective Insight: Crowd-Driven Image Understanding.” (34:02) Genevieve mentioned her next career move - becoming a Postdoctoral Researcher at Microsoft Research New England. (36:40) Genevieve talked about her teaching experience for 2 graduate-level courses: Data-Driven Computer Vision at Brown University in Spring 2016 and Deep Learning For Computer Vision at Tufts University in Spring 2017. (38:04) Genevieve shared her 2 advice for graduate students who want to make a dent in the AI/Machine Learning research community. (41:45) Genevieve went over her startup TRASH, which develops computational filmmaking tools for mobile iphono-graphers. (43:45) Genevieve mentioned the benefit of having TRASH as part of the NYU Tandon Future Labs, which is a network of business incubator and accelerators that support early stage ventures in NYC. (45:00) Genevieve talked about the research trends in computer vision, augmented reality, and scene understanding that she’s most interested in at the moment. (45:59) Closing segment. Her Contact Info: Website GitHub LinkedIn Twitter CV Her Recommended Resources: The Trouble with Trusting AI to Interpret Police Body-Cam Video Microsoft Research Podcast Stanford’s CS231n: Convolutional Neural Networks for Visual Recognition Yann LeCun's letter to CVPR chair after bad reviews on a Vision System that "learnt" features & reviews Paperspace CVPR 2019 Michael Black’s Perceiving Systems Lab at the Max Planck Institute for Intelligent Systems Nassim Taleb’s “The Black Swan”

Datacast
Episode 17: Computer Vision Research with Genevieve Patterson

Datacast

Play Episode Listen Later Jul 16, 2019 53:18


Show Notes: (2:09) Genevieve discussed her undergraduate experience studying Electrical Engineering and Mathematics at the University of Arizona. (3:18) Genevieve talked about her Master’s work in Electrical Machines from the University of Tokyo. (6:59) Genevieve went in-depth about her research work on transverse-flux motor design during her Master’s, in which she won the Outstanding Paper award at ICEMS 2009. (11:39) Genevieve talked about her motivation to pursue a Ph.D. degree in Computer Science at Brown University after coming back from Japan. (14:17) Genevieve shared her story of finding her research advisor (Dr. James Hays) as a graduate student. (18:44) Genevieve discussed her work building and maintaining the SUN Attributes dataset, a widely used resource for scene understanding, during her first year of her Ph.D. degree. (21:52) Genevieve talked about the paper Basic Level Scene Understanding (2013), her collaboration with researchers from MIT, Princeton, and University of Washington to build a system that can automatically understand 3D scenes from a single image. (24:32) Genevieve talked about the paper Bootstrapping Fine-grained Classifiers: Active Learning with a Crowd in the Loop presented at the NIPS conference in 2013, her collaboration with researchers from UCSD and Cal-Tech to propose an iterative crowd-enabled active learning algorithm for building high-precision visual classifiers from unlabeled images. (28:25) Genevieve discussed her Ph.D. thesis titled “Collective Insight: Crowd-Driven Image Understanding.” (34:02) Genevieve mentioned her next career move - becoming a Postdoctoral Researcher at Microsoft Research New England. (36:40) Genevieve talked about her teaching experience for 2 graduate-level courses: Data-Driven Computer Vision at Brown University in Spring 2016 and Deep Learning For Computer Vision at Tufts University in Spring 2017. (38:04) Genevieve shared her 2 advice for graduate students who want to make a dent in the AI/Machine Learning research community. (41:45) Genevieve went over her startup TRASH, which develops computational filmmaking tools for mobile iphono-graphers. (43:45) Genevieve mentioned the benefit of having TRASH as part of the NYU Tandon Future Labs, which is a network of business incubator and accelerators that support early stage ventures in NYC. (45:00) Genevieve talked about the research trends in computer vision, augmented reality, and scene understanding that she’s most interested in at the moment. (45:59) Closing segment. Her Contact Info: Website GitHub LinkedIn Twitter CV Her Recommended Resources: The Trouble with Trusting AI to Interpret Police Body-Cam Video Microsoft Research Podcast Stanford’s CS231n: Convolutional Neural Networks for Visual Recognition Yann LeCun's letter to CVPR chair after bad reviews on a Vision System that "learnt" features & reviews Paperspace CVPR 2019 Michael Black’s Perceiving Systems Lab at the Max Planck Institute for Intelligent Systems Nassim Taleb’s “The Black Swan”

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Simulation and Synthetic Data for Computer Vision with Batu Arisoy - TWiML Talk #281

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 9, 2019 41:36


Today we’re joined by Batu Arisoy, Research Manager with the Vision Technologies & Solutions team at Siemens Corporate Technology. Currently, Batu’s research focus is solving limited data computer vision problems, providing R&D for many of the business units throughout the company. In our conversation we discuss: An emulation of a teacher teaching students information without the use of memorization Discerning which parts of our neural network are required to make decisions An activity recognition project with the Office of Naval Research that keeps ‘humans in the loop’ and more.  The complete show notes for this episode can be found at twimlai.com/talk/281.  Register for TWIMLcon: AI Platforms now at twimlcon.com! Thanks to Siemens for their sponsorship of today's episode! Check out what they’re up to, visit twimlai.com/siemens.

Leaders In AI
[CVPR2019] Deep Learning for Medical Research. Le Lu, Former NIH Staff Scientist

Leaders In AI

Play Episode Listen Later Jul 9, 2019 25:56


This episode is a live recording from our live interview with Dr. Le Lu, Executive Director of PAII Bethesda Research Lab and former NIH staff scientist, at the CVPR 2019 conference. He discussed the progress of applying deep learning in medical research and the challenges the industry is facing. View the full interview transcripts and more inspiring talks at Robin.ly: http://bit.ly/2XyTUs9 Bethesda Research Lab. PAII Inc. is a tech-driven company focusing on smart city technologies, intelligent education programs and advanced technical solution on healthcare imaging and clinical informatics. Prior to PAII, Dr. Lu worked with the National Institute of Health Clinical Center. His research focuses on improving the modern medical image understanding and semantic parsing to fit into revolutionary clinical workflow practices.

Leaders In AI
[CVPR 2019 AI Talk] Gang Hua, VP & Chief Scientist @ Wormpex AI Research

Leaders In AI

Play Episode Listen Later Jul 6, 2019 24:08


This episode is a live recording of our interview with Dr. Gang Hua, VP and Chief Scientist at Wormpex AI Research, at the CVPR 2019 conference. He shared inspiring thoughts on the trends and challenges of computer vision and the future of Artificial Intelligence. View the full interview and transcripts at Robin.ly: http://bit.ly/2XshBm6 Wormpex AI research the research branch of a fast-growing Chinese convenience store chain Bianlifeng. Prior to Wormpex, Gang was the director of Computer Vision Science at Microsoft and an Associate Professor at Stevens Institute of Technology. He’s an IEEE Fellow, an IAPR Fellow, and an ACM Distinguished Scientist. His research focuses on computer vision, pattern recognition, machine learning, robotics, and more. Dr. Hua served as the Program Chair of the CVPR 2019 conference.

Leaders In AI
CVPR 2019 Best Paper Award Winners: Shumian Xin & Ioannis Gkioulekas @ Carnegie Mellon University

Leaders In AI

Play Episode Listen Later Jul 2, 2019 14:35


Our CVPR 2019 special AI talk features Shumian Xin and Prof. Ioannis Gkioulekas, co-authors of “A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruction”, and winners of the CVPR 2019 Best Paper Award. Shumian is a 2nd-year PhD student and Ioannis is an assistant professor, both from the Robotics Institute of Carnegie Mellon University. They shared the details on the impressive results found from their recent research project, especially the importance of the collaborative experience within the development process, her enthusiasm about computational imagery, and the comparison with LiDAR sensors. View full interview video at Robin.ly: http://bit.ly/2Xju4rZ

Leaders In AI
My First CVPR. Christopher Manning, Professor & Director @ Stanford AI Lab

Leaders In AI

Play Episode Listen Later Jun 28, 2019 23:41


This episode is a live recording of our interview with Professor Christopher Manning at the CVPR 2019 conference. He shared inspiring thoughts on the research trends and challenges in computer vision and natural language processing and the progression to commercialization. Visit Robin.ly to view video and full transcripts: http://bit.ly/2RJjUzQ. Dr. Christopher Manning is the Thomas M. Siebel Professor in Machine Learning in the Departments of Computer Science and Linguistics at Stanford University. He is also the Director of the Stanford Artificial Intelligence Lab. His research focuses on exploring software that can intelligently process, understand, and generate human language material. During the CVPR 2019 conference, Professor Manning was an invited speaker for the workshop “Visual Question Answering and Dialog.” The paper he advised “GQA, a new dataset for compositional question answering over real-world images” was also accepted by CVPR 2019.

Google Cloud Platform Podcast
NVIDIA T4 with Ian Buck and Kari Briski

Google Cloud Platform Podcast

Play Episode Listen Later Mar 26, 2019 35:59


Today on the podcast, we speak with Ian Buck and Kari Briski of NVIDIA about new updates and achievements in deep learning. Ian begins by telling hosts Jon and Mark about his first project at NVIDIA, CUDA, and how it has helped expand and pave the way for future projects in super computing, AI, and gaming. CUDA is used extensively in computer vision, speech and audio applications, and machine comprehension, Kari elaborates. NVIDIA recently announced their new Tensor Cores, which maximize their GPUs and make it easier for users to achieve peak performance. Working with the Tensor Cores, TensorFlow AMP is an acceleration into the TensorFlow Framework. It automatically makes the right choices for neural networks and maximizes performance, while still maintaining accuracy, with only a two line change in Tensor Flow script. Just last year, NVIDIA announced their T4 GPU with Google Cloud Platform. This product is designed for inferences, the other side of AI. Because AI is becoming so advanced, complicated, and fast, the GPUs on the inference side have to be able to handle the workload and produce inferences just as quickly. T4 and Google Cloud accomplish this together. Along with T4, NVIDIA has introduced TensorRT, a software framework for AI inference that’s integrated into TensorFlow. Ian Buck Ian Buck is general manager and vice president of Accelerated Computing at NVIDIA. He is responsible for the company’s worldwide datacenter business, including server GPUs and the enabling NVIDIA computing software for AI and HPC used by millions of developers, researchers and scientists. Buck joined NVIDIA in 2004 after completing his PhD in computer science from Stanford University, where he was development lead for Brook, the forerunner to generalized computing on GPUs. He is also the creator of CUDA, which has become the world’s leading platform for accelerated parallel computing. Buck has testified before the U.S. Congress on artificial intelligence and has advised the White House on the topic. Buck also received a BSE degree in computer science from Princeton University. Kari Briski Kari Briski is a Senior Director of Accelerated Computing Software Product Management at NVIDIA. Her talents and interests include Deep Learning, Accelerated Computing, Design Thinking, and supporting women in technology. Kari is also a huge Steelers fan. Cool things of the week Kubernetes 1.14: Production-level support for Windows Nodes, Kubectl Updates, Persistent Local Volumes GA blog Stadia blog How Google Cloud helped Multiplay power a record-breaking Apex Legends launch blog Massive Entertainment hosts Tom Clancy’s The Division 2 on Google Cloud Platform blog Interview NVIDIA site NVIDIA Catalog site CUDA site Tensor Cores site TensorFlow sote Automatic Mixed Precision for Deep Learning site Automatic Mixed Precision for NVIDIA Tensor Core Architecture in TensorFlow blog TensorFlow 2.0 on NVIDIA GPU video NVIDIA Volta site NVIDIA T4 site WaveNet blog BERT blog Compute Engine site T4 on GCP site Webinar On Demand: Accelerate Your AI Models with Automatic Mixed-Precision Training in PyTorch site PyTorch site NVIDIA TensorRT site TensorRT 5.1 site Kubernetes site Rapids site NVIDIA GTC site Deep Learning Institute site KubeFlow Pipeline Docs site KubeFlow Pipelines on GitHub site NVIDIA RTX site Question of the week Where can we learn more about Stadia? general info developer access Where can you find us next? Mark will be at Cloud NEXT, ECGC, and IO. Jon may be going to Unite Shanghai and will definitely be at Cloud NEXT, ECGC, and IO. NVIDIA will be at Cloud NEXT and KubeCon, as well as International Conference on Machine Learning, The International Conference on Learning Representations, and CVPR

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Can We Train an AI to Understand Body Language? with Hanbyul Joo - TWIML Talk #180

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Sep 13, 2018 51:53


In this episode, we’re joined by Hanbyul Joo, a PhD student in the Robotics Institute at Carnegie Mellon University. Han, who is on track to complete his thesis at the end of the year, is working on what is called the “Panoptic Studio,” a multi-dimension motion capture studio with over 500 camera sensors that are used to capture human body behavior and body language. While robotic and other artificially intelligent systems can interact with humans, Han’s work focuses on understanding how humans interact and behave so that we can teach AI-based systems to react to humans more naturally. In our conversation, we discuss his CVPR best student paper award winner “Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies.” Han also shares a complete overview of the Panoptic studio, and we dig into the creation and performance of the models, and much more. For the complete show notes for this episode, visit https://twimlai.com/talk/180.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Vision Systems for Planetary Landers and Drones with Larry Matthies - TWiML Talk #171

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Aug 9, 2018 43:12


Today we’re joined by Larry Matthies, Sr. Research Scientist and head of computer vision in the mobility and robotics division at JPL. Larry joins us on the heels of two presentations at this year’s CVPR conference, the first on Onboard Stereo Vision for Drone Pursuit or Sense and Avoid and another on Vision Systems for Planetary Landers. In our conversation, we touch on both of these talks, his work on vision systems for the first iteration of Mars rovers in 2004 and the future of planetary landing projects. For the complete show notes, visit https://twimlai.com/talk/171.  

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Learning Semantically Meaningful and Actionable Representations with Ashutosh Saxena - TWiML Talk #170

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Aug 6, 2018 45:35


In this episode I'm joined by Ashutosh Saxena, a veteran of Andrew Ng’s Stanford Machine Learning Group, and co-founder and CEO of Caspar.ai. Ashutosh and I discuss his RoboBrain project, a computational system that creates semantically meaningful and actionable representations of the objects, actions and observations that a robot experiences in its environment, and allows these to be shared and queried by other robots to learn new actions. We also discuss his startup Caspar, which applies these principles to the challenge of creating smart homes. For complete show notes, visit https://twimlai.com/talk/170.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Taskonomy: Disentangling Transfer Learning for Perception (CVPR 2018 Best Paper Winner) with Amir Zamir - TWiML Talk #164

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 16, 2018 49:14


In this episode I'm joined by Amir Zamir, Postdoctoral researcher at both Stanford & UC Berkeley. Amir joins us fresh off of winning the 2018 CVPR Best Paper Award for co-authoring "Taskonomy: Disentangling Task Transfer Learning." In this work, Amir and his coauthors explore the relationships between different types of visual tasks and use this structure to better understand the types of transfer learning that will be most effective for each, resulting in what they call a “computational taxonomic map for task transfer learning.” In our conversation, we discuss the nature and consequences of the relationships that Amir and his team discovered, and how they can be used to build more effective visual systems with machine learning. Along the way Amir provides a ton of great examples and explains the various tools his team has created to illustrate these concepts.

Computer Science Channel (Video)
Manmohan Chandraker: Giving Computers the Gift of Vision

Computer Science Channel (Video)

Play Episode Listen Later Jun 18, 2018 3:26


Manmohan Chandraker is an assistant professor at the CSE department of the University of California, San Diego. He received a PhD from UC San Diego and was a postdoctoral scholar at UC Berkeley. His research interests are in computer vision, machine learning and graphics-based vision, with applications to autonomous driving and human-computer interfaces. His works have received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UCSD, a PAMI special issue on best papers of CVPR 2011, the Best Paper Award at CVPR 2014 and the 2018 NSF CAREER Award. He has served as an Area Chair at CVPR, ICCV, AAAI and ICVGIP, associate editor at JAIR, senior PC member at IJCAI and tutorials chair at 3DV. Series: "UCTV Prime" [Science] [Show ID: 33421]

Computer Science Channel (Audio)
Manmohan Chandraker: Giving Computers the Gift of Vision

Computer Science Channel (Audio)

Play Episode Listen Later Jun 18, 2018 3:26


Manmohan Chandraker is an assistant professor at the CSE department of the University of California, San Diego. He received a PhD from UC San Diego and was a postdoctoral scholar at UC Berkeley. His research interests are in computer vision, machine learning and graphics-based vision, with applications to autonomous driving and human-computer interfaces. His works have received the Marr Prize Honorable Mention for Best Paper at ICCV 2007, the 2009 CSE Dissertation Award for Best Thesis at UCSD, a PAMI special issue on best papers of CVPR 2011, the Best Paper Award at CVPR 2014 and the 2018 NSF CAREER Award. He has served as an Area Chair at CVPR, ICCV, AAAI and ICVGIP, associate editor at JAIR, senior PC member at IJCAI and tutorials chair at 3DV. Series: "UCTV Prime" [Science] [Show ID: 33421]

The AI Podcast
NVIDIA Chief Scientist Bill Dally on Where AI Goes Next - Ep. 62

The AI Podcast

Play Episode Listen Later Jun 13, 2018 26:37


NVIDIA researchers are gearing up to present 19 accepted papers and posters, seven of them during speaking sessions, at the annual Computer Vision and Pattern Recognition conference next week in Salt Lake City, Utah. Joining us to discuss some of what's being presented at CVPR, and to share his perspective on the world of deep learning and AI in general is one of the pillars of the computer science world, Bill Dally, chief scientist at NVIDIA.

Google Cloud Platform Podcast
Cloud AI with Dr. Fei-Fei Li

Google Cloud Platform Podcast

Play Episode Listen Later Mar 7, 2018 30:59


Dr. Fei-Fei Li, the Chief Scientist of AI/ML at Google joins Melanie and Mark this week to talk about how Google enables businesses to solve critical problems through AI solutions. We talk about the work she is doing at Google to help reduce AI barriers to entry for enterprise, her research with Stanford combining AI and health care, where AI research is going, and her efforts to overcome one of the key challenges in AI by driving for more diversity in the field. Dr. Fei-Fei Li Dr. Fei-Fei Li is the Chief Scientist of AI/ML at Google Cloud. She is also an Associate Professor in the Computer Science Department at Stanford, and the Director of the Stanford Artificial Intelligence Lab. Dr. Fei-Fei Li's main research areas are in machine learning, deep learning, computer vision and cognitive and computational neuroscience. She has published more than 150 scientific articles in top-tier journals and conferences, including Nature, PNAS, Journal of Neuroscience, CVPR, ICCV, NIPS, ECCV, IJCV, IEEE-PAMI, etc. Dr. Fei-Fei Li obtained her B.A. degree in physics from Princeton in 1999 with High Honors, and her PhD degree in electrical engineering from California Institute of Technology (Caltech) in 2005. She joined Stanford in 2009 as an assistant professor, and was promoted to associate professor with tenure in 2012. Prior to that, she was on faculty at Princeton University (2007-2009) and University of Illinois Urbana-Champaign (2005-2006). Dr. Li is the inventor of ImageNet and the ImageNet Challenge, a critical large-scale dataset and benchmarking effort that has contributed to the latest developments in deep learning and AI. In addition to her technical contributions, she is a national leading voice for advocating diversity in STEM and AI. She is co-founder of Stanford's renowned SAILORS outreach program for high school girls and the national non-profit AI4ALL. For her work in AI, Dr. Li is a speaker at the TED2015 main conference, a recipient of the IAPR 2016 J.K. Aggarwal Prize, the 2016 nVidia Pioneer in AI Award, 2014 IBM Faculty Fellow Award, 2011 Alfred Sloan Faculty Award, 2012 Yahoo Labs FREP award, 2009 NSF CAREER award, the 2006 Microsoft Research New Faculty Fellowship and a number of Google Research awards. Work from Dr. Li's lab have been featured in a variety of popular press magazines and newspapers including New York Times, Wall Street Journal, Fortune Magazine, Science, Wired Magazine, MIT Technology Review, Financial Times, and more. She was selected as a 2017 Women in Tech by the ELLE Magazine, a 2017 Awesome Women Award by Good Housekeeping, a Global Thinker of 2015 by Foreign Policy, and one of the “Great Immigrants: The Pride of America” in 2016 by the Carnegie Foundation, past winners include Albert Einstein, Yoyo Ma, Sergey Brin, et al. Cool things of the week Terah Lyons appointed founding executive director of Partnership on AI article & site Fully managed export and import with Cloud Datastore now generally available blog How Color uses the new Variant Transforms tool for breakthrough clinical data science with BigQuery blog & repo Google Cloud and NCAA team up for a unique March Madness copmetition hosted on Kaggle blog Interview AI4All site, they are hiring and how to become a mentor Cloud AI site Cloud AutoML site Cloud Vision API site and docs Cloud Speech API site and docs Cloud Natural Language API site and docs Cloud Translation API site and docs Cloud Machine Learning Engine docs TensorFlow site, github and Dev Summit waitlist ImageNet site & Kaggle ImageNet Competition site Stanford Medicine site & Stanford Children's Hospital site Additional sample resources on Dr. Fei-Fei Li: Citations site Stanford Vision Lab site Fei-Fei Li | 2018 MAKERS Conference video Google Cloud's Li Sees Transformative Time for Enterprise video Past, Present and Future of AI / Machine Learning Google I/O video Research Symposium 2017 - Morning Keynote Address at Harker School video How we're teaching computers to understand pictures video Melinda Gates and Fei-Fei Li Want to Liberate AI from “Guy with Hoodies” article Dr. Fei-Fei Li Question of the week Where can I learn more about machine learning? Listing of some of the many resources out there in no particular order: How Google does Machine Learning coursera Machine Learning with Andrew Ng coursera and Deep Learning Specialization coursera fast.ai site Machine Learning with John W. Paisley edx Machine Learning Columbia University edx International Women's Day March 8th International Women's Day site covers information on events in your area, and additional resources. Sample of recent women in tech events to keep on radar for next year: Women Techmakers site Lesbians Who Tech site Women in Data Science Conference site Where can you find us next? Mark will be at the Game Developer's Conference | GDC in March.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
This Week in ML & AI - 7/1/16: Fatal Tesla Autopilot Crash, EU Outlawing Machine Learning & CVPR

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jul 2, 2016 35:32


This Week in Machine Learning & AI brings you the week’s most interesting and important stories from the world of machine learning and artificial intelligence. This week's show covers the first fatal Tesla autopilot crash, a new EU law that could prohibit machine learning, the AI that shot down a human fighter pilot (in simulation), the 2016 CVPR conference, 10 hot AI startups, the business implications of machine learning, cool chatbot projects and if you can believe it, even more. Show notes for this episode can be found at https://twimlai.com/7.