POPULARITY
In this episode of SEE HEAR FEEL, Dr. Dennis Proffitt, Commonwealth Professor of Psychology, Emeritus at University of Virginia, shares insights from his expansive research on perception and interoception. Dr. Proffitt discusses how social, emotional, and physical factors influence how people perceive space, including practical examples from his research on the steepness of hills and the impact of physical fitness and companionship. He also delves into the significance of social relationships on perception and explores the emerging field of interoception, explaining how our internal and external perceptions shape our emotions and responses. The episode highlights how both perception and interoception are intertwined with our experiences and actions.00:00 Introduction to Dr. Dennis Proffitt00:49 Personal Anecdote: NASA and Perception of Hills02:32 Research Findings: Perception and Physical Fitness05:04 Social Relationships and Perception08:40 Understanding Interoception12:45 Final Thoughts on Perception and Experience
(3:40) - The Secret to Super-Fast Robot Vision This episode was brought to you by Murata, the market leader in leading-edge electronics components. Click HERE to learn more about Murata's contributions to the future of autonomous vehicles. Become a founding reader of our newsletter: read.thenextbyte.com As always, you can find these and other interesting & impactful engineering articles on Wevolver.com.
This paper does not present a novel method. Instead, it delves into an essential, yet must-know baseline in light of the latest advancements in Generative Artificial Intelligence (GenAI): the utilization of GPT-4 for visual understanding. Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks. Specifically, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training. Additionally, we evaluate its visual proficiency in directly recognizing diverse visual content. To achieve this, we conduct an extensive series of experiments, systematically quantifying the performance of GPT-4 across three modalities: images, videos, and point clouds. This comprehensive evaluation encompasses a total of 16 widely recognized benchmark datasets, providing top-1 and top-5 accuracy metrics. Our study reveals that leveraging GPT-4's advanced linguistic knowledge to generate rich descriptions markedly improves zero-shot recognition. In terms of visual proficiency, GPT-4V's average performance across 16 datasets sits roughly between the capabilities of OpenAI-CLIP's ViT-L and EVA-CLIP's ViT-E. We hope that this research will contribute valuable data points and experience for future studies. We release our code at https://github.com/whwu95/GPT4Vis. 2023: Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang https://arxiv.org/pdf/2311.15732v1.pdf
We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new dataset (AS-1B) with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. It covers a wide range of 3.5 million common and rare concepts in the real world, and has 132.2 billion tokens that describe the concepts and their attributes. Leveraging this new dataset, we develop the All-Seeing model (ASM), a unified framework for panoptic visual recognition and understanding. The model is trained with open-ended language prompts and locations, which allows it to generalize to various vision and language tasks with remarkable zero-shot performance, including region-text retrieval, region recognition, captioning, and question-answering. We hope that this project can serve as a foundation for vision-language artificial general intelligence research. Models and the dataset shall be released at https://github.com/OpenGVLab/All-Seeing, and demo can be seen at https://huggingface.co/spaces/OpenGVLab/all-seeing. 2023: Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao https://arxiv.org/pdf/2308.01907v1.pdf
Dr. Irwin Braverman conceptualized visits to an art museum and observation of an unknown painting as a useful visual exercise for doctors to improve observational skills. He touches on the backstory behind this program, what he thinks emotional intelligence is, why he doesn't think it can be taught, and having no regrets. Dr. Irwin Braverman, MD is Professor Emeritus of Dermatology at Yale University. He received his undergraduate degree from Harvard College and his MD from Yale University in 1955 and trained as an intern, resident, and fellow at Yale as well as Yale-New Haven Hospital and Medical Center. He is board certified in dermatology as well as dermatopathology. He retired from being a full-time faculty member in the Department of Dermatology in 2010 after 48 years of service. His research focused on the cutaneous microcirculation, cutaneous T-cell lymphoma, and aging. He is the author of Skin Signs of Systemic Disease.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.01.25.524429v1?rss=1 Authors: Hayden, D. J., Finnie, P. S. B., Thomazeau, A., Li, A. Y., Cooke, S. F., Bear, M. F. Abstract: In mouse primary visual cortex (V1), familiar stimuli evoke significantly altered responses when compared to novel stimuli. This stimulus-selective response plasticity (SRP) was described originally as an increase in the magnitude of visual evoked potentials (VEPs) elicited in layer (L) 4 by familiar phase-reversing grating stimuli. SRP is dependent on NMDA receptors (NMDAR) and has been hypothesized to reflect potentiation of thalamocortical synapses in L4. However, recent evidence indicates that the synaptic modifications that manifest as SRP do not occur on L4 principal cells. To shed light on where and how SRP is induced and expressed, the present study had three related aims: (1) to confirm that NMDAR are required specifically in glutamatergic principal neurons of V1, (2) to investigate the consequences of deleting NMDAR specifically in L6, and (3) to use translaminar electrophysiological recordings to characterize SRP expression in different layers of V1. We find that knockout of NMDAR in L6 principal neurons disrupts SRP. Current-source density analysis of the VEP depth profile shows augmentation of short latency current sinks in layers 3, 4 and 6 in response to phase reversals of familiar stimuli. Multiunit recordings demonstrate that increased peak firing occurs to in response to phase reversals of familiar stimuli across all layers, but that activity between phase reversals is suppressed. Together, these data reveal important aspects of the underlying phenomenology of SRP and generate new hypotheses for the expression of experiencedependent plasticity in V1. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
In this work, we design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition. Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation. Technically, CoT block first contextually encodes input keys via a convolution, leading to a static contextual representation of inputs. 2021: Yehao Li, Ting Yao, Yingwei Pan, Tao Mei Keywords: Instance segmentation, Object detection https://arxiv.org/pdf/2107.12292v1.pdf
#involution #computervision #attention Convolutional Neural Networks (CNNs) have dominated computer vision for almost a decade by applying two fundamental principles: Spatial agnosticism and channel-specific computations. Involution aims to invert these principles and presents a spatial-specific computation, which is also channel-agnostic. The resulting Involution Operator and RedNet architecture are a compromise between classic Convolutions and the newer Local Self-Attention architectures and perform favorably in terms of computation accuracy tradeoff when compared to either. OUTLINE: 0:00 - Intro & Overview 3:00 - Principles of Convolution 10:50 - Towards spatial-specific computations 17:00 - The Involution Operator 20:00 - Comparison to Self-Attention 25:15 - Experimental Results 30:30 - Comments & Conclusion Paper: https://arxiv.org/abs/2103.06255 Code: https://github.com/d-li14/involution Abstract: Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. In this work, we rethink the inherent principles of standard convolution for vision tasks, specifically spatial-agnostic and channel-specific. Instead, we present a novel atomic operation for deep neural networks by inverting the aforementioned design principles of convolution, coined as involution. We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation. The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks, including ImageNet classification, COCO detection and segmentation, together with Cityscapes segmentation. Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely while compressing the computational cost to 66%, 65%, 72%, and 57% on the above benchmarks, respectively. Code and pre-trained models for all the tasks are available at this https URL. Authors: Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
MLOps community meetup #47! Last Wednesday, we talked to Adrià Romero, Founder and Instructor at ProductizeML. // Abstract: In this talk, we tackled: - Motivations and mission behind ProductizeML. - Common friction points and miscommunication between technical and management/product teams, and how to bridge these gaps. - How to define ML product roadmaps, (and more importantly, how to get it signed off by all your team). - Best practices when managing the end-to-end ML lifecycle. / Takeaways: - Self-study guide that reviews the end-to-end ML lifecycle starting with some ML theory, data access and management, MLOps, and how to wrap up all these pieces in a viable but still lovable product. - Free and collaborative self-study guide built by professionals with experience on different stages from the ML lifecycle. // Bio: Adrià is an AI, ML, and product enthusiast with more than 4 years of professional experience on his mission to empower society with data and AI-driven solutions. Born and raised in the beautiful and sunny Barcelona, he began his journey in the AI field as an applied researcher at the Florida Atlantic University, where he published some of the first deep learning works in the healthcare sector. Attracted by the idea of deploying these ideas to the real world, he then joined Triage, a healthcare startup building healthcare solutions powered by AI, such a smartphone app able to detect over 500 skin diseases from a picture. During this time, he has given multiple talks at conferences, hospitals, and institutions such as Novartis and Google. Previously, he interned at Huawei, Schneider Electric, and Insight Center for Data Analytics. Early this year, he started crafting ProductizeML, An Instruction and Interactive Guide for Teams Building Machine Learning Products where he and a team of AI & product specialists carefully prepare content to assist on the end-to-end ML lifecycle. // Final thoughts Please feel free to drop some questions you may have beforehand into our slack channel (https://go.mlops.community/slack) Watch some old meetups on our youtube channel: https://www.youtube.com/channel/UCG6qpjVnBTTT8wLGBygANOQ ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Adria on LinkedIn: https://www.linkedin.com/in/adriaromero/ References mentioned on this episode: https://en.wikipedia.org/wiki/ImageNet https://twitter.com/productizeML https://course.productize.ml/ https://github.com/ProductizeML/gitbook https://adria756514.typeform.com/to/V4BDqjYA - Newsletter Signup https://www.buymeacoffee.com/ Timestamps: [00:00] Introduction to Adrià Romero [00:32] How did you get into tech? [02:16] ImagiNet Project (Visual Recognition Challenge) [06:49] Visual Recognition with Skin Lesions [07:05] Fundamental vs Applied Research (Academia experience) [08:44] Motivation for technology [14:55] Transition to ProductizeML [19:09] ProductizeML Context [23:50] What was its that made you think that Education is probably more powerful? [24:21] ProductizeML Objective [26:55] Ethics: Do you want to put that in there later? [30:12] ProductizeML Content Format and Tools [34:07] ProductizeML Catalogue [39:28] ProductizeML Audience Target [42:54] "Buy me a coffee" platform [48:29] Do you ever foresee with the educational being more vertical-specific?
Il riconoscimento facciale viene sempre più utilizzato. Ma è affidabile? Che problemi etici pone e che impatto ha sulla società?
Part 1 of 2: Melvin Kaplan, OD, From 1st New Haven Seminar (Gesell) 12/2/90 Visual Recognition of Psychobiological Function. Learn more at https://www.ovpjournal.org/ Questions, comments: Doc@BrightEyesTampa.com -Nate Bonilla-Warford Music Credit: Lee Rosevere - Dreaming, used by Creative Commons non-commercial attribution. https://leerosevere.bandcamp.com/track/dreaming-body-electric-mix
At the time of this recording, the New York Times released a report titled "As Cameras Track Detroit’s Residents a Debate Ensues Over Racial Bias," which discussed some of the issues in machine learning such as algorithmic bias, and facial recognition software giving more false matches for black people than white people. We chat with Nashlie Sephus, CTO of Partpic, which was acquired by Amazon in 2016, and now an Applied Science Manager at Amazon Web Services, about her journey into machine learning, developing Partpic, and tackling some of the ethical issues in machine learning in her new role at Amazon. Show Links Digital Ocean (sponsor) MongoDB (sponsor) Heroku (sponsor) TwilioQuest (sponsor) re:MARS Algorithmic bias Facial recognition Hex bolt Central processing unit (CPU) Caffe Graphics processing unit (GPU) Shazam Cocktail party effect Artificial intelligence (AI) Machine learning Partpic CodeNewbie Survey New York Times: As Cameras Track Detroit’s Residents, a Debate Ensues Over Racial Bias Part Finder Carriage bolt NVIDIA Apache MXNet TensorFlow Computer Vision Music information retrieval Digital signal processing
What are the main challenges when it comes to visual recognition? Why you should never use the term facial detection? What can customer facial expressions tell you about your product? How does DynaScan organize its research process? Tune in to find out answers to these and many other burning questions about the visual recognition in digital signage.
In this podcast, Pouya Bashivan, Postdoctoral Associate, Massachusetts Institute of Technology, in the Department of Brain and Cognitive Sciences, discusses brain activity and visual recognition. Bashivan's lab is particularly interested in how the brain sees, how the visual cortex recognizes objects, etc. Working with Kohitij Kar, the neuroscientists have engaged in some of the most significant testing to date of computational models that seek to approximate a brain's visual cortex. Using their models of a brain's visual neural network, Bashivan and Kar provided a new way to control individual neurons as well as populations of neurons in the middle of the network. Their research shows that certain versions of these new models are similar enough to a brain that they could perhaps be used to control brain states in animals. The MIT neuroscientist discusses upcoming developments in machine learning and artificial intelligence. He talks about self-driving cars and various new technologies, and the challenges developers and manufacturers face in regard to vision in their operating systems. Continuing, Bashivan states that the future of machine vision could be focused well beyond basic object recognition. Specifically, Bashivan discusses new models that are being trained to play digital games, which could advance to training in more realistic style games that could provide significant data on how human brains are affected by various experiences.
Sandra Mau, founder and CEO of TrademarkVision, an image recognition for brand protection startup based in Pittsburgh (USA) and Brisbane (Australia) that was recently acquired by CompuMark (Clarivate Analytics), shares her vision about how machine learning and image recognition technology is transforming the intellectual property (IP) ecosystem at a time when 40 percent of trademark filings contain images and when disruptive solutions developed by technology companies are increasingly being used by IP offices, law firms, and in-house counsel.Every two weeks, on Tuesday, Brand & New gives the floor to inspiring individuals, with a 360-degree vision, to help brand owners, intellectual property lawyers, and marketing and finance professionals (and beyond!) stay curious and agile in an ever-evolving business environment. Brand & New is a production of the International Trademark AssociationHosted by Audrey DauvetContribution of M. Halle & S. Lagedamond - Music by JD BeatsTo go further:About Sandra Mau (link to https://www.linkedin.com/in/sandramau/)About TrademarkVision (link to https://trademark.vision)About CompuMark (link to http://www.compumark.com)Also of interest:INTA Annual Meeting Program (link to https://www.inta.org/2019Annual/Pages/Home.aspx)INTA 2019 Annual Meeting Table Topics on Artificial Intelligence (link to: https://www.inta.org/2019Annual/Program/Pages/Table-Topics.aspx)INTA Designs Committee (link to:http://www.inta.org/Committees/Pages/DesignsCommittee.aspxEntrepreneur Magazine: 8 Companies Changing How Machine Learning Is Used (link to https://www.entrepreneur.com/article/293105)World Intellectual Property Organization Presentation: Machine Learning Applied to Trademarks Classification and Search (link to https://www.wipo.int/edocs/mdocs/globalinfra/en/wipo_ip_itai_ge_18/wipo_ip_itai_ge_18_p17.pdf)
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
This week on the podcast we’re featuring a series of conversations from the AWS re:Invent conference in Las Vegas. I had a great time at this event getting caught up on the latest and greatest machine learning and AI products and services announced by AWS and its partners. In this episode we’re joined by Chris Adzima, Senior Information Analyst for the Washington County Sheriff’s Department. While Chris is not a traditional data scientist, he comes to us with a very interesting use case using AWS’s Rekognition. Chris is using Rekognition to identify suspects in the Portland area by running their mugshots through the software. In our conversation, he details how he is using Rekognition, while giving us few use cases along the way. We discuss how bias affects the work he is doing, and how they try to remove it from their process, not only from a software developer standpoint, but from a law enforcement standpoint and what his next steps are with the Rekognition software. This was a pretty interesting discussion, i’m sure you’ll enjoy it!
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we bring you our final interview from backstage at the NYU FutureLabs AI Summit. Our guest this week is Matt Zeiler. Matt graduated from the University of Toronto where he worked with deep learning researcher Geoffrey Hinton and went on to earn his PhD in machine learning at NYU, home of Yann Lecun. In 2013 Matt’s founded Clarifai, a startup whose cloud-based visual recognition system gives developers a way to integrate visual identification into their own products, and whose initial image classification algorithm achieved top 5 results in that year’s ImageNet competition. I caught up with Matt after his talk “From Research to the Real World”. Our conversation focused on the birth and growth of Clarifai, as well as the underlying deep neural network architectures that enable it. If you’ve been listening to the show for a while, you’ve heard me ask several guests how they go about evolving the architectures of their deep neural networks to enhance performance. Well, in this podcast Matt gives the most satisfying answer I’ve received to date by far. Check it out. I think you’ll enjoy it. The show notes can be found at twimlai.com/talk/22.
If you experience any technical difficulties with this video or would like to make an accessibility-related request, please send a message to digicomm@uchicago.edu. Partha Niyogi Memorial Conference: "Biologically Plausible Neural Networks for Invariant Visual Recognition". This conference is in honor of Partha Niyogi, the Louis Block Professor in Computer Science and Statistics at the University of Chicago. Partha lost his battle with cancer in October of 2010, at the age of 43. Partha made fundamental contributions to a variety of fields including language evolution, statistical inference, and speech recognition. The underlying themes of learning from observations and a rigorous basis for algorithms and models permeated his work.
If you experience any technical difficulties with this video or would like to make an accessibility-related request, please send a message to digicomm@uchicago.edu. Partha Niyogi Memorial Conference: "Biologically Plausible Neural Networks for Invariant Visual Recognition". This conference is in honor of Partha Niyogi, the Louis Block Professor in Computer Science and Statistics at the University of Chicago. Partha lost his battle with cancer in October of 2010, at the age of 43. Partha made fundamental contributions to a variety of fields including language evolution, statistical inference, and speech recognition. The underlying themes of learning from observations and a rigorous basis for algorithms and models permeated his work.
Supercomputers at LANL that simulate a human's visualization process may soon boost technology to automate driving and making life generally easier and safer.
